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Abstract 

We present an analysis of distributed, negotiated commitment. This is the problem of 
ensuring that processes in a distributed negotiation commit consistently to the outcome, even 
in the face of system failures. Our analysis is based on reasoning in a temporal, epistemic logic 
about the knowledge of the processes in any solution to the problem. 

In our analysis, we present necessary levels of knowledge for commitment in settings that 
admit process or communication failures; we also consider settings that must be nonblocking or 
guarantee termination. From the necessary knowledge, we derive interprocess communication 
requirements, via a result linking knowledge and communication; this yields the underlying 
communication structure in any protocol that supports negotiated commitment. We then give a 
message lower bound for achieving commitment and several other impossibility results, showing 
that certain desirable commitment behaviours cannot be supported by any protocoL These 
results are based on new techniques, which use the knowledge and communication requirements 
that we derive from the specification of negotiated commitment. 

This paper contributes a detailed and precise specification and analysis of the generalized 
distributed commitment problem. Further, the paper shows new ways in which one can use 
reasoning about knowledge to gain insight into distributed problems. 

Keywords: Commitment, fault tolerance, process knowledge, logic of knowledge, knowledge and 
communication, negotiation, bidding, 
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Think globally^ act locaiJy. 
EnviroEmental credo and 
distributed systems slogan. 

1 Introduction 

Negotiation is a useful form of coordination in distributed computer systems, for dynamically 
establishing commitments to joint courses of action. Forms of computer-based negotiation have 
appeared or been proposed in the literature for various purposes, such as resource aUocationj 
task allocation, task scheduling, transaction atomic commitment , distributed planning, stock 
trading, security pass allocation, and travel reservations (see [Maze89] for a survey). The 
problem of negotiated commitment is to ensure that the processes in a distributed negotiation 
commit consistently to the outcome, even in the face of unpredictability, such as system failures* 
All negotiating systems must solve this problem. Previous appearances of computer-based 
negotiation J however j lacked a formal definition of commitment, used an informal model of 
computation, or did not clearly state systems assumptions.^ 

We give a formal specification and analysis of negotiated commitment, using a modal logic 
of knowledge as the main analysis tooL Distributed problems are typically stated in terms of 
global behaviours, yet processes must act locally, based on their inherently incomplete view 
of the global state of the system. The abstraction of knowledge is simply a precise way to 
reason about the extent to which a process's local state accurately reflects important aspects of 
the global state. Several researchers have demonstrated the value in reasoning formally about 
the knowledge of processes in distributed computations, as a precise way to specify, analyze, 
and derive protocols for distributed problems (see, for example, [H adz 90, Halp87, HaMo90, 
HaZu89, Maze89, MazeQO^ MoTu88, Tutt89]). From a knowledge-theoretic perspective, a group 
of processes acquires and disseminates knowledge about the system, through various events, as 
a system computation evolves. Intuitively, a process's actions depend on its knowledge, and its 
knowledge changes as a result of actions [HaFa89]. 

Reasoning formally about knowledge may offer a useful conceptual abstraction and an ele- 
gant formalism for expressing reasoning which is often either intuitive and operational or formal 
but opaque [HaMo90]. For example, informal arguments often proceed as follows: ^*The coordi- 
nator must send a message to each of the others, so that each may learn the joint decision. Each 
must know the decision in order to carry out the corresponding actions. Once the coordinator 
has received an acknowledgement from a participant, it knows that the participant knows the 

decision and Such reasoning j if formalized, is transformed into complex combinatorial 

arguments which obscure the relationship to the problem specification and the knowledge of 
the participants. Our approach seeks to retain, in the formal reasoning process, the relationship 



■"■The exception is atomic commitment — see be!ow. 
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between the problem specification and the informal reasoning. For a distributed problem, one 
first shows, formally, the knowledge each process requires to solve the problem, and then one 
derives communication requirements from the knowledge requirements. 

This approach helps the protocol designer in several ways. By determining the knowledge 
required by processes to solve a distributed problem, one gains insight into the propositional 
structure of the set of possible protocols for the problem. Further, deriving interprocess com- 
munication requirements from the knowledge requirements yields insight into the underlying 
communication structure of any protocol to solve the problem. By using the knowledge and 
communication requirements derived from a problem specification, one can then show impossi- 
bility results and design protocols for the given problem. 

The problem of negotiated commitment involves two kinds of processes: ® a distinguished 
process J historically called the manager^ which coordinates the commitment; and ^ the set of 
contTactors^ or bidders. Each of the contractors chooses whether to bid or not on an announced 
contract. The manager selects from among the bidding contractors to establish a dependency 
set, representing those contractors which the manager wants to commit to performing the 
announced task. The manager then relays its decision to the contractors, and the dependency 
set members commit accordingly. The period of contractor uncertainty about the manager's 
decision, and the potential for process or communication failures, makes consistent negotiated 
commitment nontrivial to achieve.^ 

Negotiated commitment is related, but incomparable, to the problem of atomic commitment 
in distributed transaction systems [BeHG87, Gray79, LampSlj; they differ in two main aspects. 
First, in negotiated commitment, the manager coordinates the commitment; in atomic commit* 
ment^ there need not be a single coordinator. Second, in negotiated commitment, commitment 
may be established among subsets of the participating processes; in atomic commitmentj a 
commitment must include all processes. The specifications of these two problems reflect these 
differences (cf. [Hadz90]), Throughout this paper, we comment on how results on negotiated 
commitment translate to results on atomic commitment. 

Our analysis yields several kinds of results. First, we show the levels of knowledge each 
process requires to achieve different kinds of commitment behaviour. Second j we identify re- 
strictions on distributed computations, including underlying communication patterns j needed 
to facilitate the identified states of knowledge. Third, we give a message lower bound for 
achieving commitment, based on the impossibility of achieving the required knowledge in fewer 
messages than the lower bound. Fourth, we show the impossibility of achieving commitment 
under certain assumptions of system characteristics; these results are based on the impossibility 
of the infinite communication needed to achieve the required knowledge transfer. We include 
settings that admit process recovery from failures, that must be nonblocking, or that must guar- 
antee protocol termination. In particular, we show the impossibility of commitment protocols 



^Aaynchrony^ regardless of system failiiresj also makes commitment nontrivial to achievCj but we focus our 
analysis on systems which admit failures; cf. [Ma^efi9]. 
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which (1) support independent process recovery, (2) are terminating under process recovery and 
bounded communication time, (3) are nonblocking under permanent communication failures ^ 
or (4) are nonblocking and terminating under communication failures. 

Impossibility result (2) is new. Dwork and Skeen (1983) showed the message lower bound 
for the related problem of atomic commitment, basing their arguments on the message passing 
graphs produced by a "best-case/' failure-free instance of an atomic commitment protocol. 
Impossibility results (1) and (3) have proofs for atomic commitment, based upon an examination 
of the plausible state transitions in atomic commitment protocols given in a finite state machine 
model [Skee82, SkSt83], One cannot always easily determine j however j how the combinatorial 
proofs reflect the problem being solved. Unlike these proofs, our knowledge-theoretic proofs first 
determine the propositionai content [RoKa86] of the problem solutions, based on the problem 
specification. From this semantic analysis, we then argue that certain message passing patterns 
are needed. By using this approach, we have derived results that hold for more circumstances 
than the previous results. Hadstilacos (1987, 1990) gives a knowledge- theoretic treatment of 
atomic commitment. He shows the minimum knowledge levels that hold in two- phase and 
three-phase atomic commitment protocols, the impossibility of nonblocking protocols under 
the assumptions in (3) and (4) above, and a message lower bound. Our treatment is similar 
to that of Hadailacos, although it differs in several important ways. For example, we allow 
processes to recover from process failures, and we admit systems in which messages may spend 
only a limited time in transit. Our treatment of negotiated commitment attacks in more depth 
a problem slightly different from atomic commitment, and the results for atomic commitment 
foUow naturally from the results for negotiated commitment. We prove our results using a 
uniform underlying strategy which differs from that used by Hadzilacos for his results. 

The paper proceeds as follows. In Section 2, we give a model of distributed computa- 
tion. Section 3 presents a logic of knowledge in which one can express problem specifications 
that include temporal and epistemic (that is, knowledge) assertions; we also show some useful 
properties of systems. In Section 4, we specify negotiated commitment. In Section 5, we an- 
alyze the specification to determine knowledge requirements. We use these results in Section 
6 to determine the communication requirements for commitment. In Section 7, we give the 
impossibility results, which follow from some further knowledge and communication analysis. 
Section 8 discusses knowledge requirements in general nonblocking systems, and Section 9 is 
our summary. 



2 A Model of Distributed Systems 

In this section, we give a model of distributed computation in which we wiU ground the definition 
of knowledge and our analysis of negotiated commitment. 
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2<1 Executions 

Adapting [Hadz90]j we consider distributed systems which comprise two types of elements: (T) 
processes, which execute events (let 11 represent the set of n processes); and @ a communication 
system, Af^ which contains a set of message packets (of the form (pjDl, representing the 
message m sent from p to g at time i). The events are of two kinds: communicative and 
noncommunicative. The communicative events are SEND(m5g) (the executing process sends 
message m to process where m 6 Mi a. message vocabulary) and RECV(m,9) (the executing 
process receives message m from process q] m may be the null message A or a message from 
M)' These are the only two events by which a process may communicate externally; all other 
process events are local and have no effect on the communication system, 

A possible joint behaviour over time of the processes and the communication system is 
modelled by an execution (or run). Each execution e is a function mapping time to a global 
state tuple of the form {time, history^-^, historjpgj history^, packets), time represents the 
time at which the system is observed; historyp^ represents the finite sequence of events executed 
by process pi in execution e up to the observation instant; and packets is the set of message 
packets in transit at that instant. As is common , we take "time" to be the natural numberS| 
N, The points of an execution set £ ^ Pts(f ), are {(e, /) | e g f and / 6 N}. 

Here is some notation on executions, required for the sequel. Throughout this paper, we use 
the letter ^e' to refer to an execution* ^p' and 'g' refer to processes. We use other letters, notably 
S'^id to refer to times. Any of these may appear superscripted or subscripted. For 
p € n, we write e(/jp) for history^^ p's history element in the tuple at point (e,/); similarly, 
we write e{f,Af) for packets, the set of message packets in the communication system at point 
(e,/), d H e(/,p) asserts that d is the last event in the sequence historyp at point (e,/); 
d e e(f^p) indicates that event d appears in the sequence; | e(/,p) j indicates the number of 
events in the sequence; and e(/,p) - d indicates the concatenation of event d to the sequence. 
We write d C (e,/ + Ijp) to say that process p has just executed event d at point (e, / + 1), 
i.e,, d C (ej/ + l,p) iff e(/ + l,p) = e(/,p) ■ d. e(/ + l,p) > e{f,p) denotes that p's event 
sequence up to time / + 1 in e has, as a prefix, p's event sequence up to / in e. For P^Q C H, 
e{f,Af)[P,Qf={{p,m,q,i) e e{f,Af) | p £ P and g G Q}, That is, e{fM)[P.Q] is the set of 
messages in transit from processes in P to processes in Q at instant / of execution e. f =^ 
Il\P. 

For each p G II, a relation on the points in system S captures when p has the same 
event sequence in two points. For (e,/),(e', 3) €Pts{£)^ we write {G,f)^p{e\g) iff e{f^p) = 
e'{g,p)^ For process set P C n, (e, /)^p(e',ff) iff (e, /)^p(e',5) for all p 6 P , Similarly, the 
communication system is the same in both points, written (e, f)'^Af{e\g)i iff e(/,A/') = e'(y, A/^). 

Given two executions e', e G£ and instant / G jW, (e', /) and (e, f) are historically equivalent^ 
written (e', /) = (6,/), iff the two executions have the same global states through time /: iff, 
for all 0 < 5 < /, e*{g) = e{g), A point (e'.g) extends (e,/), written (e',ff) > (e,/)j iff 
(e'j /) = (e, /) and 3 > /. An execution e' extends a point (e, /) iff (e', /) = (e, /). 
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Executions conform to the following informal operational behaviour: At the beginning of 
time (system initialization), the communication system is empty, and no process has executed 
any events. Each process executes at most one event between successive observation instants. 
A message is removed from the communication system if the message is received or lost. Only 
messages which were sent but not yet removed may appear in the communication system. We 
describe this behaviour axiomatically in [Ma2e89]. 

2-2 Systems of Executions 

Informally speakingj one often characterizes the behaviours of a distributed protocol by a set 
of executions S over II and Af (see [HaFa89j Maze89] for more discussion). In order to link 
communication to knowledge gain (in Section 6.1), we require an execution set to exhibit some 
natural closure properties which ensure that, if the set represents certain behaviours, then it 
represents certain other behaviours; these properties capture the ways in which one process's 
event sequence and the behaviour of the communication system affect another process's event 
sequence. We call such a closed execution set a system. Intuitively, the ability of a process p to 
execute some event should not depend on the events executed so far by other processes or on 
the behaviour of the communication system, unless p's event is a receive — a process can execute 
a RECV event only if there is an appropriate message in the communication system, and such 
a message must have been sent by some process. We give the three required closure properties 
in Appendix A. 

2.3 System Characteristics 

We now describe the system characteristics we will consider in our analysis. Informally, a 
system £ is weakly terminating if every point of £ can be extended to a point beyond which 
no process executes any more events in any extension [K0T088], 

Definition 1 A system £ is weakly terminating if^ for each point {t^ f) €Pts(5), there is {e'^g) 
extending (sj/) such that, for all p G II and aU (e"j/t) GPts(^) extending (e^^), e^\h^p) = 
^'(ffiP)^ i^'iS) is ^ terminating extension of (e, /) and a terminating point of f . S 

We capture process crash failures and message loss by another set of closure properties. Infor- 
mally, a system is subject to process failures if ^^ny process subset may fail at any time. A failure 
of process p is modelled by a FAIL event in an event sequence for p. We define fail(€3 Oj P) = 0, 
for aU e G f and P C II (no process is initially failed). For / > 0 and p G H, we coUect 
into fail(ej /, P) each execution e' in an execution set £ such that ® e' extends (e, / - 1); @ 
processes other than those in P execute the same events at (e'j/) as at (e,/); (D each nonter- 
minated member of P is failed; and ® any message that p £ P sends at (e, /) does not appear 
in e*{f^J\f)^ and any message that p ^ P receives at (e,/) appears in e^[f^J\f). 

Definition 2 A system £ is subject to process failures ifj for any (e,/) 6Pts(f ): 
(any process subset may fail) for any P C IIj fail(e5 /, P) ^ 0, S 
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Let Fail€d{e, f) represent the set of failed processes at point (e,/): 
Failed{eJ) ^ {p \p € U and FAIL H e(/,p)}. 
A system is subject to process failures and recovery if^ at any time, any process subset may 
fail, and any subset of failed processes may recover. 

Definition 3 A system £ is subject to process failures and recovery if, for any {e,f) ePts(^): 

• the system is subject to process failures, and 

• for each nonempty P C Failed(e^ f)y there is (e',ff) ePts(f) properly extending (fi,/) 
such that FAIL e'(3jp), for all p g P 

Some thought will show that, in any system £ subject to process failures and recovery^ no 
terminated process is failed: if p € 11 has terminated at (e^/) £Pts(5), then p ^ Failed{e, /). 

A system is subject to communication failures if any subset of messages in transit at any 
time may be lost. Let iW be a subset of the messages in transit at point (e,/): M C e{fjAf). 
We collect into the set lose(e, /, M) the set of executions e£ such that e' extends (e, / - l)^ 
(e' Jhn(e,/). and eU^f) = e{f^U) \ M. 

Definition 4 A system £ is subject to communication failures If ^ for any (e, /) GPts(f) and 
any M C e(f,J\f): \ase{eJ,M) ?^ 0. S 

In a system which is subject to permanent communication failures, any subset of messages 
m transit at any time may be lost, andj for any subset of processes at any time, it is possible 
that all messages sent to that subset from that time forward will be lost* 

Definition 5 A system £ is subject to permanent communication failures if: 

• the system is subject to communication failures, and 

• for any (e,/) ePts(f ) and P C II, there is some ei extending (e,/) such that, for all 
g>f. ex(5,^^)[n,P] = 0. « 

As we shall see, permanent communication failures preclude certain behaviour, such as 
nonblocking behaviour in commitment systems. What if we assume that communication failures 
are never permanent, but instead are transient? In a system which is subject to transient 
communication failures, any subset of messages in transit at any time may be lost but, for each 
execution, there is a point beyond which no more messages are lost* We call (e, /) a lossless 
point if no messages are lost at (Cj /), i.e., if 

e{f,J\f) - eif ^ l,Af) U {(p, m,g,/) | p,? 6 n and SEND{m,g) C (e, f,p)} \ 

{{p,mr<l,i}\PiQ G n and RECV{m,p) C (e,/,?)}. 
noloss(cj /) is the set of executions which are historically equivalent to e up to time /, except 
that no messages are lost at time /: noloss(e5/) = {ei \ ei extends (e,/ - 1) such that 
(«ij/)^n(^j/) and (ei,/) is a lossless point}. Note that e is not necessarily in noloss(e,/). 
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Definition 6 A system € is subject to transient communication failures if 

• the system is subject to communication failures 

• for all e g 5 , there is / > 0 such that, for all g > /, e G noIoss(e,5).<Q 

This is a strong assumption about the behaviour of the communication system, because it 
guarantees, for example, that if any message is sent repeatedly, it can eventually be received 
[K0T088]. As we shall see, even though some behaviours which are impossible under perma- 
nent communication failures are possible under transient communication failures, certain other 
system behaviour is still unattainable under transient communication failures. 

In the sequel, if a system under discussion is not explicitly identified as being subject to a 
kind of failure, we assume that the system is free of those failures. 

We also model systems in which a message has a maximum lifetime in transit. InformaUy, 
a system is k-'transit bounded if any message sent disappears from the communication system 
at most k time units after being sent. 

Definition 7 A system £ is k-transit bounded (for some finite k > 0) if, for any m € M) 
q,peH,ee£: if SEND(m, q) C (e,/, p), then (p, m,g,/) ^ e(/ + k,Af),^ 

Round- based protocols typically assume A?-transit bounded systems. 

3 Problem Specifications and a Logic of Knowledge 

To specify a problem to be solved, one gives a set of properties which any protocol solving 
the problem must exhibit. We will express those properties in the epistemic (knowledge) logic 
of Halpern and Moses (1990). The intuition used for defining knowledge is based on possible 
worlds: at any given moment, an agent considers several worlds, including the real one, to be 
possible J because the agent is uncertain of the state of other parts of the system* InformaUy, 
we say that, in a given state of the system, an agent p knows a fact if is true in all 
worlds p considers possible (that is, in all global states in which p has its current local state), 
Epistemic specifications are surprisingly common: any problem specification which asserts that 
a property or value is private to some process is an epistemic specification, because it asserts 
that the property or value depends oidy on the process's local state (for example, a contractor's 
bid choice). We are also interested in epistemic propositions to capture assertions on the extent 
to which a process's local state accurately reflects the system state, such as "the manager knows 
whether the contractors have bid." The logic also aUows one to express temporal properties, in 
order to capture assertions about the behaviour of a system over time, such as "the protocol 
eventually terminates" or "the contractor eventually knows the manager's decision." 
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3<1 A Logic of Knowledge 

The language of the logic has the following symbols: a set $ of primitive propositions; a finite 
set n of process names; {-^,wp,Q}] {JT^ | ar £ II}; and {Kx \ X C Tl, X ^ iji}. The set of 
well-formed formulae {or wffs) Cui^) is the smallest set such that (l) every member of $ is a 
well- formed formula, and (2) if ^ and i^ are well-formed formulae, then so are (^^), (i^ V 
O0J U(f>, K^<f>, Kx4>^ We abbreviate {-^{{^4>) V {^i}))) by {0 A ^) and ((-«^) V i>) by (0 D i)f. 
We interpret wffs via possible worlds semantics relative to an mterpveied system (or model)^ 
a structure M = (5 jJ) in which f is a system over n and N, and T:$-> 2P*8(^) IS an mterpre- 
tation mapping each primitive proposition to the set of points in S in which the proposition 
holds. Intuitively, £ represents the set of possible executions of a protocol of interest, and J 
interprets primitive propositions of interest with respect to £ The points of the system are 
the possible worlds. Knowledge is based on a complete history interpretation [HaMo90]; that 
is J each process's view of the system consists of all of the events it has executed, and so each 
process knows as much as it can — no other encapsulation of a process's state can give a process 
more knowledge. 

Given a model M, we write (Mje, /)|= ^ to express that wfF ^ holds in point (e, /) of the 
model. (If M is understood from context, we write (e, /) |= We define j= as follows (assume 

0,^6 £n(*)): 

[Primitives] For <i> (M,e,/) ^ ifF (e,/) G I(^). 
[Negation] (M, e, /) |= (-k^) iff (M, e, /) i= does not hold, 
[Disjunction] (M, e, /) |= (^ V ^) iff (M, e,f)\= (f> or (M, e, /) [= ij} (inclusively), 
[Eventually] (M, /) S= iff, for all ^ e£ such that (e, /) = (e', /), 

there is some h> f such that (Mje'j/i) |= ("eventually holds in point (e,/) 

iff 0 is true now or wiU be in any execution extending (e, /)j i.e, Iff <f) wiU hold at some 

future point no matter what the future is,) 
[Henceforth] (M, e, /) 1= iff, for aU e' g5 such that (e, f) = {e\ /), 

{M,e\g) 1= 0 for aU > /. ("henceforth <f>" holds in point (e,/) iff (f> holds now and 
in any possible extension of (e, /).) 
[Process Knowledge] Forp £ II, (M,e,/) |= K^^ iff (M,e',sr) \= <^,for all (e^s) ePts(f) 

such that (fi, /)~p(e',ff). ('*p knows ^" Iff ^ is true in all points which look to p 

similar to the current one.) 



^In the sequel, we elide the parenthesea "(" and ")" in the usual way in formulae in which no ambiguity 
reanlts. Furthermore, for clarity, we sometimes use for and for '*((^ D ^}A(^ D 0))'' abbreviates 
"0 = -0" (read is equivalent to Tp"). To discuss a formula which appeais repeatedly, once for each member 
of a set of processes or process sets, we use the following abbreviations. Fox = {a:l,22,.*.,a;m}, and '^^'(b) a 
wif mentioning x, ABex{i^{^)) is defined as ^(sb/jhi) ^^(o/ia) A ^ . - A^^^^stm)] that is, the conjunction of instances 
of "0 with aU appearances of r in each instance of if replaced uniformly by an element of X, For example, 
A^texFAJLED^ expresses that all of the processes in X are failed. Similarly, '^sEx(i^{a)) is defined as '0(as/a:i) V 
^{s/ir2) V , . . V V'(a/a!m)' If -^^^ is the empty set, then As^xii^l^]) and V„£x(V'(b)) are defined to be trivially true. 

*The interpretation is usually simple and straightforwaidi faaseti on a mapping of each piiinitive proposition 
to the set of global states in which it holds. 
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[Collective Knowledge] For P C (M, e, /) |= Kpip ifF (M, e^ g) \= 4>, 
for aU (e^5) GPts(f) such that (e,/)~p(e',5), ("the members of P collectively know 
iff ^ holds in all points which the members of P collectively think possible.) 
A wfF valid in structure M, written M j= iff (M, /) |= (p for aU points (e^ /) EPts(f ). 
A problem specification is a set of wffs, each of which must be valid in any model which purports 
to solve the problem. 

Note that processes in this logic have the mfrospeciion property: for any wiF P C 11, and 
model M, M ]= Jfp^ D KpKp4> and M [== -^Kp<p 3 Kp-^Kptp. 

Lemma 8 states that, if ip is necessary for <^ and p knows <j>, then p knows t/^. The proof of 
this simple lemma illustrates the use of the possible worlds definition of knowledge. 

Lemma 8 For any model M, p g H, wfFs ^5 ip^i{M\=<l)D ij;, then M \= Kp<f> D Kpip. 

Proof: Assume by way of contradiction (bwoc) that there is (e,/) ePts(f ) such that 
C^^ /) N Kpif>A-^Kpif?. Therefore, there is {ei,g) £Pts{£) such that (ex,?)^p(e5 /) and (ei,^) |= 
-iip. By the semantics of knowledge, {ei^g) \= i> A Tip^ violating the antecedent. P 

3.2 Useful Properties of Models 

For any system which we will henceforth model, we include in the set of primitive propositions 
the following ones: ® for each p 6 I!, FAILEDp^ which is interpreted to mean that p is currently 
failed; @ FAILURE^ which is interpreted to mean that a process failure or a communication 
failure has occurred [Hadz90]; (S) PROCFAIL^ which is interpreted to mean that any of the 
processes has failed at some point up to the current one; ® INIT^ representing the assertion 
that the system is in an initial state [Lamp 80]; and (s) for each p & TL, TERMp^ which is 
interpreted to mean that p has terminated. Precisely ^ 

Definition 9 (Standard Interpretation) 

Given any model M = (f ,1), 2" is a standard interpretation iff 

liFAILEDp) = {(e,/) | {ej) ePts(e) and FAIL H eif.p)}. 
X{PROCFAIL) = {{ej) \ {e, f) ePts{S) and there is p 6 H such that 
FAIL6e{/,p)}. 

I{FAILURE) = {{e, f) \ [e, f) ePts(£) and, for some pen, FAIL e e{/,p) or, 
for some q,p & 11, message m, SEND(g, m) e e(/,p), 
RECV ^ e{f,q), and (p.rrLg) ^ e{f,M)}. 

I{INIT]^{{e,0)\e€£}. 

X{TERMp) = {(ej) \ for aU ei extending (e J), for all 3 > /, ei{g,p) = e{/,p)}. G 

We will henceforth assume that all models are standard interpretations. Note that INIT, 
PROCFAIL, FAILEDp for all p 6 H, and FAILURE are all initially false in any model. 

We now identify a set of useful and important properties of formulae in interpreted systems; 
propositions in the specification and analysis of negotiated commitment will exhibit these prop- 
erties. We also relate these concepts to each other. 
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Stable; A wff 4> is stable (in M) if the following property holds: M ^ D O^^. A stable 
wfF stays true forever after it becomes true [ChLa85]. Stability is useful for expressing 
immutable properties and decisions, such as a system deadlock or the choice to commit 
to a contract. Note that FAILURE, PROCFAIL, and TERMp are stable. 

Local: A formula (p is local to P {in M)yfoT P CJ[/i{ M \= KpipW Kp^4>, That is, P always 
knows the truth value of ^ [ChMi86]. Local formulae are intended to model predicates 
whose value is controUed by or locally testable by the actions of the processes to which 
the formulae are local. U P = {p}, we write that (f> is local to p instead of {p}. Note that 
FAILEDp and TERMp are local to p. 

P-failure- dissociated: A formula <f> is called P -failure-dissociated (in M), for P C 11, ifj 
whenever (f> is false, 0 remains false as long as a process in P is failed [HadzQO]. That 
is, for any (e,/) ePts(f)j if (e,/) [= and FAIL H e{/ + l,p), for any p € P, then 
(e, / + 1) I— "10, If P = {p}j we write that is p- failure- dissociated instead of {p}-failure- 
dissociated. 

P- receive- dependent: A formula is called P -receive-dependent (in M) if, when it is false j 
it can become true only if some process in P receives a nonnuH message from a process 
not in P [Hadz90], That is, for any (ej) €Pts(f), if (ej) ^ and {ej+ l) |^ <f>, 
then RECV(m,g) C (€,/+ l^p), for some p € P j m ^ A, and g G P. If P {p}, we write 
that 0 is p- receive-dependent instead of {p}-receive-dependent. 

Nontrivial: A formula ^ is called nontrivial {in M) iff, whenever it is false, it could stay false 
forever; i.e., ^4^3 -"O^. That is, for any [eJ] ePts(^), if (e, /) |= -.0, then there 
is €£ such that e' extends (e,/) and (e',5) |= for all j > /. 

Point wise noatrivial: A formula <p is called pointwise nontrivial (in M) iff, whenever it is 
false, it may remain false in the next time instant; i.e., for any {e^ f) ePts(^), if 
(^j/) [= then there is e' such that e' extends (e, /) and (e',/ + 1) |= -1^. 

Note that any nontrivial formula is perforce pointwise nontrivial, but not vice versa. Lemma 
10 shows that a nontrivial formula is eventually true exactly when it is true (note that Lemma 
10 does not hold for pointwise nontrivial formulae). 

Lemma 10 Let M be a model, <l> a nontrivial formula, and (e,/) £Pts(f). 
Then (eJ) \= <f> iff (eJ) \= 04>. 

Proof: 4> ^ holds by definition, 0<^ D 4^ holds by contraposition of the definition of 
nontrivial. E 

Intuitively, locality means that the proposition is "about" the process set to which the 
proposition is local. For example, if we were modelling the outcome of a coin toss by process p, 
and wff ^ represents the proposition "p has flipped a heads", then we expect (f> to be local to p. 
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Furthermore, we do not expect (f> to become true while p is failed, so <f) is p- failure- dissociated. 
We expect the outcome of p^s coia toss to be fair and not forced to be either heads or tails, so <f> 
is nontriviaJ (and, therefore, pointwise nontrivial), FinaUyj if another process q must receive a 
message sent by p after the flip in order for q to learn that p flipped a heads, then the proposition 
Kp(f^ is 5- receive- dependent. 

As the following lemma shows, these concepts are strongly related, 

Lemina 11 Given any model wff and P C H, ^ is pointwise nontriviaJ in M if either of 
the following holds: 

• M is subject to process failures and <f> is P-failure-dissociated. 

• M is subject to communication failures and <f> is .P-receive-dependent.^ 

Purtherj if M is subject to process failures and <f> is P- receive- dependent, then (f) is P-failure- 
dissociated. Q 

In the interest of space, we omit this proof The reader may find this and any other omitted 
proofs in [Maze89]. 

Theorem 12 states that, in any weakly terminating system which is either (1) subject to 
communication failures or (2) fc-transit bounded and subject to process failures, every point 
can be extended to a terminating point without any process receiving any further messages, 
Koo and Toueg (1988) showed this for weakly terminating systems subject to communication 
failures; instead of communication failures, we use the combination of fc- transit boundedness 
and process failures to ensure that messages may disappear without being received* 

Theorem 12 Let M be either (l) a weakly terminating model subject to communication 
failures, or (2) a weakly terminating model which is A;-transit bounded and subject to process 
failures. For any (e, /) £Pts(f), there is a terminating extension (eijff) such that no process 
receives a nonnull message after (ei,/). 

Proof : Koo and Toueg (1988) showed the result for clause (l) as Theorem 3.1. (They proved 
the result for asynchronous systems, but they note that the result holds for a system with any 
synchrony property. Further, they prove the result for initial points (e30)foralle6£5 but the 
result generalizes for aU points.) [Maze89, Theorem 4.2] shows the result for clause (2), □ 

Lemma 13 characterizes some system conditions under which a receive-dependent wfF is 
nontrivial and, therefore, eventually holds only when it holds already. 

Lemma 13 In any model M which is 

(1) subject to permanent communication failures, 

(2) weakly terminating and subject to communication failures, or 



*This alao holds for models with asynchronous processes and ^ local to P. We did not foimaily define models 
with asynchronous proceaaes, so we le&ve that case out of the statemeut of this result. 
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(3) weakly terminating, subject to process failures and recovery, and fc- transit bounded, 

any wfF which is g-receive-dependent, for some g G II, is nontrivial in M. (Consequently, by 
Lemma 10, M H t^.) 

Proof: Pick any (e,/) ePts{£) such that (e,/) |= (If there is none, then 4> is valid 
andj therefore, nontrivial,) Because the system is subject to one of the three conditions, there 
is ei ££ extending (e, /) such that (ei,j) |= -n^, for all 5 > / (because q does not receive at 
or after (ei, / + 1) any message which would establish this is possible under each condition: 
(1) because the message may be lost and no more messages received by g, (2) by Theorem 12, 
or (3) by Theorem 12,) Therefore, <f> is nontrivial. D 

4 Specification of Negotiated Commitment 

The specification is a set of propositions which must be valid in the model of a system induced 
by a protocol that solves the problem,^ We call any such model a C-system. 

We divide the processes in the system into two disjoint sets: the manager, {m}, and the 
contractors^ or bidders, C . InformaUy, each of the contractors chooses whether to bid or not 
on an announced contract. The manager selects from among the bidding contractors to estab- 
lish a dependency set, representing those contractors which the manager wants to commit to 
performing the announced task; contractors not in the dependency set must not carry out the 
task* We represent contractor c's choice to bid by a primitive proposition BIDc] we represent 
its choice not to bid by NO-BID^^ We represent the manager's possible dependency set choices 
by the primitive propositions DEPENI^ for each nonempty x C C . For each c G C , we de- 
fine the allowed dependencies set Vc C {x \ x e 2^ and c € x}. NOT-CHOSS^ represents 
the manager's choice not to make c a codependent. The manager records locaUy a decision 
outcome for each contractor, either AWARD^, representing that m expects c to carry out the 
task, or REJECI^^ representing that m expects c not to carry out the task. Similarly, each 
contractor records locally a decision outcome, either ACCEPT^ representing that c wiU carry 
out the contract, or REFUSE^, that c will not carry out the contract. Informally, the pro- 
cesses reach consistent commitment if, for each c 6 C , the manager decides AWARD^ and c 
decides ACCEPT^, or m decides REJECT^, and c decides REFUSE^. Each BIDc, NO-BID^, 
ACCEPT and REFUSE^ proposition is stable and local to c; BIDc is also c- failure-dissociated. 
Each DEPENI^, NOT-CHOSE^^, AWARD^^, and REJECT"^ proposition is stable and local 
to m; each DEPEND^ is also m-failure-dissociated. All of these propositions are initially false, 

A C-systemi^ an interpreted system, with the primitive propositions described above, which 
satisfies the following additional properties of negotiated commitment under process or com- 
munication failures. 

^'Tlie apeciiicatioiL of negotiated commitment is more complicated than that of atomic commitment (which 
requires ten properties), because of the posaibility of "subatomic" dependencies and the singularity of the 

coordinator. 
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Necessity properties 

Dependent Acceptance: For ail c € C , M \=ACCEPTcD V^^ep, DEPEND%^. 
(An accepted contractor must be a codependent.) 

Dependent Award: For all c 6 C ^ M \^ AWARDED V^^^p^DEPEJ^D^. 
(An awarded contractor must be a codependent.) 

No Unilateral Dependencies: For all c £ C , M |= W^^j^^DEPEND^dBID^. 
(A codependent must have bid.) 

No Predetermined Bids: For aU c G C , M \=imTD ^(BW^WNO-BIDc). 
(No contractor starts with its bid choice made.) 

No Predetermined Dependencies: For all c G C ^ 

M \=INITd ^{y^ev.DEPEND^vNOT^CHOSE'^). 

(The manager starts without having made any dependency choices,) 

Exclusivity properties 

Exclusive Bid: For all c € C , M |= ^(BIDcANO^BIDc). 

(A contractor may choose only one of the two bidding options.) 

Exclusive Dependencies: For all c £ C , 

M 1= -^{NOT-CHOSB'^Aiy^ev.DEPEND^)). 

(The manager may not both exclude c from any dependency set in 

and include c in a dependency set.) 

Nonintersecting Dependencies; 

For each c G C ^ for each pair x^y such that a; 
M 1= -^{DEPEND^A DEPEND^). 

(c may be involved in at most one dependency set at any time in any 
one negotiation.) 

Total Decision Harmony: For all c £ C , 

M \= ^{AWARD^^AREFUSE^) 
M t= -.(JiEJECI^AACCEPT,) 

(The manager and each contractor can never decide inconsistently.) 

M 1= ^{AWARD^^AREJECiy^) 
M h ^(ACCEPT.AREFUSE,). 

(Only one of two possible decisions is allowed for each process.) 
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For all E 6 Ucsc c, d e 

M ^DEPEND^ D -^{AWARD%,AREJEGTi). 

(if contractors c and d are both in the dependency set x, 

then m cannot decide for them inconsistently . . .) 

M ^DEPENDZ, D ^{ACCEPT^A REFUSE^) 

(. . . and c and d cannot decide inconsistently with each other . . .) 

M ^DEPEND^, D ^(AWARD^^AREFUSEd). 

(. _ and m cannot award to c and a codependent d refuse . . .) 

M \=DEPEND%^ D ^{REJECT%,AACCEPT^y 

. and m cannot reject c and a codependent d accept,) 

Nontriviality properties 

Nontrivial Process Failure: 

For all P C n, M |= ^Vp^pFAJIEDpD -^0\/^^pFAILEDp, 
(If a process is not failed , then it need not faiL) 

Nontrivial System Failure: 

M h ^failured ^OFAILURE. 

{]£ no failure has yet occurred, then a failure does not have to occur,) 

Jointly Nontrivial Bid Choice: 

For all c e C , M 1= {^BID^A^NO-BID^) D {^OBID^A^ONO-BWc). 
(If c has not yet chosen whether to bid, then both bid choices are open.) 

Jointly Nontrivial Dependencies: 

For an c e C , M 1= {^W^^^^DEPEND^A^NOT-CHOSE'^) 3 

i^O^/ ^sv.DEPEND%,A-^ON0T-CH0SE=^) 
(If the manager has not yet chosen to make c a codependent, then m is not forced 
either to make c a codependent or to ensure c will not be a codependent*) 

System-failure-free Dependency Mix; 

For all nonempty Q CC, P C Q, 

M 1= (nFAli l7fl£A[AceQ(BIi3cA-[Vx€i>.I>EPEKI^V]ffOT-CHOSI^])]) D 
{^u^[^FAILUREa{A,^p [v^^v.DEPEND^])A{A^eQ\pNOT-CHOSE^J]). 
(If a system failure has not yet occurred and m has not yet made its dependency choices 
about some bidding subset Q of C, then, for all subsets P of those contractors, there is 
an extension in which a system failure still has not occurred and each member P is a 
codependent and each member of Q not in P is not chosen.) 
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Decision completion properties 

Failure or Decision: M |= 0{FAILVRE^ 
[Ac6c(BJI)eVN0-BID,)A[ 

([V.gu^CEPBJ^DS,] A [AWARD%,AACCEPTc])V 
{NOT-CHOSE'^AlREJECT^.ARBFUSE^])] 

] )■ 

(If there are no failures, then all contractors should make a bid choice , all codependents 
should establish commitments ^ and all non codependents should estabhsh **noncommit- 
ments".) 

Post-railure Termination: For all {ej) ePts{£), 

if Failed{e^f) — 0 and noloss(e, /) ^ 0^ then there is ti 6 noloss(e3 /) and h such 

that there are no process or communication failures in (ei^g) for f < g < h 

(i,e,j Fail€d{ei^g) = 0 and ei 6 noloss(€i,5)) and 

(eiyh) \= Ac^ciACCEPTaV REFUSE^) A A^^c{AWARD^y REJECT^). 

(If there are presently no failures, then it is possible for no process or communication 
failures to occur for sufficiently long that all processes decide.) 

The following resulting properties of C-sy stems are straightforward: if no process has failed 
yet, then no process is forced to fail; there are executions without failures; for each contractor 
c E C J there is an execution in which c establishes commitment without any system failures 
having occurred and an execution in which c establishes noncommitment (i.e., refuses) without 
any system failures having occurred; for each c € C , neither commitment nor noncommitment is 
predetermined; establishing commitment and establishing noncommitment is each possible for 
each contractor in any C-system; each of the BID^, NO-BID^, DEPEND^, and NOT-CHOSE^ 
propositions is non trivial (and therefore each is point wise nontrivial); and V ^^Dc^^PEND^ is 
m- failure- dissociated and point wise nontrivial* 

5 Initial Knowledge Analysis 

Given the specification of negotiated commitment , we now wish to determine levels of knowl- 
edge which each process needs to commit. It is straightforward to show the following simple 
knowledge requirements. The first one, for example, states that, for a dependency set x which 
includes contractor c, if a process p knows that the dependency set is establishedj then p knows 
that c bid. 

Lemma 14 For any C-system M, c G C , p G IIj 
L M h ifp(Vxe^>. DEPEND^) D K^BIDc. 
2.M\= K^AWARD^D K^{y,ev^DEPENirj. 
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3. M 1= KpAWARD^D KpSID^. 

4. M 1= KpACCEPT^D Kp(y^^^^DEPEND^). 

5. M 1= K^ACCEPTcD K^BID^. 

Proof: Follows from: (l) Lemma 8 and No Unilateral Dependencies, (2) Lemma 8 and 
Dependent Awards, (3) items 2 and 1, (4) Lemma 8 and Dependent Acceptancej and (5) 
items 4 and L D 

This matches our intuition about the problem; the important point is that we are able to 
formalize and validate that intuition directly/ These knowledge requirements are also enough to 
show the message lower bound. As we shall see in following sections, the knowledge requirements 
(and the corresponding communication requirements) are not always so simple. 

6 Communication Requirements 

One of the goals of a problem analysis is to determine the message passing structure of proto- 
cols to solve the problem. We now derive some communication requirements for commitment 
solutions, based on the knowledge requirements of Lemma 14. First, we show the following 
underlying communication structure for any negotiated commitment protocol: if P C C con- 
tractors have accepted, then there must have been a message chain from each c £ P torn and a 
subsequent message chain from m to each c. Then we show the following message lower bound 
for commitment: if some subset P of contractors has accepted j then the number of nonnull 
RECVs is at least twice the number of contractors which accepted; and if the manager has 
awarded to some subset P of contractors j then the number of nonnull RECV events is at least 
the number of awarded contractors. 

To get these results, we need a theorem given by Mazer (1989) which identifies circumstances 
under which processes in faulty distributed systems must communicate for one process to gain 
knowledge about another. One can use this theorem as a high-level link between knowledge 
and communication; the theorem hides detailed, combinatorial arguments from the high-level 
view. 

6,1 Message Chain Theorem 

Informally, this result says that, if at some time a proposition 0 about process p is false and at 
some later time another process q knows that tj> is true, then q received a message through some 
chain of message passing which originated at given one of the following conditions: processes 



The analogoua result for atomic commitment is that, for proceae p to commit, p must know that every site 
voted to commit the transaction [Had^QO]. 
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can crash-fail and <f> cannot become true while p is failed; or messages can be lost and tp is never 
forced to become trne.^ The result is formalized as follows. 

Given an execution e, a message chain from process p to process q in interval (e, /) to (e, 3) 
is a sequence of send/receive pairs such that (/ < /i; /i < /i+i, for 1 < i < 2n] f^n ^ p)- 
SEND(mi,pi) C (e,/i,p); RECV(rrii,p) □ (e, /a^Pi); SEND(m2,p2) C (e,/3,pi); 
RECV(m5,Pi)[: (e,/4.P3); ...SEND(m„,g)c (e,/2„^i,p„_i); RECV(m^,p^_i) C (e./^^^g). 

The abbreviation P Q indicates a message chain of length at least one from P to Q 
(execution and interval will be clear from context). We abbreviate {p} {q] as p q. 

For any model M, e G £ , nonempty process sets P C 11 and Q C 11 such that P f]Q = 9y 
and wS (p local to P, a <j>-message chain from P to Q in interval (e^/)to(€,z)isa message chain 
from somep G P to some g 6 Q in an interval (e^g) to (e^i) such that f < (Mje,^ — l) j= -i^, 
(M,€,5) 1= and {M,e,z) |= Kgip. 

Here is the knowledge gain result. 

Theorem 15 (The Message Chain Theorem) [MazeSQ, Maze90] 

Fix a model Mj any nonempty P C 11 and Q C 11 such that P n Q = 0, and wff tp local to 
P in M. Further, let one of the following two conditions hold: (1) M is subject to process 
failures and <p is P- failure-dissociated in M, or (2) M is subject to communication failures and 
(f> is pointwise nontrivial in M, Fix point (e,/) in £ and i > f such that (M,ej/) |= ^0 and 
{M,e,i)\=Kg<l>. 

Then there is a (^message chain from PtoQin(ej/)to(e^i). 0 

Recall from Lemma 11 that, if 4> is P-failure- dissociated in a model which is subject to 
process failures, then 0 is pointwise nontrivial. Pointwise nontriviality is a key concept in 
understanding the Message Chain Theorem. Intuitively, pointwise nontriviality causes uncer- 
tainty; Q requires a 0- message chain in order to learn (p because Q must be able to distinguish 
between worlds in which ^ holds and those in which p does not hold (these latter worlds are 
possible because (p earlier did not hold and was not forced to hold). Without the message 
chain, Q has iio basis upon which to make the required distinction. See [Maze89, Maze90] for 
the detailed proof. 

CoroUary 16 and Lemma 17 below use the Message Chain Theorem and some earlier lemmas 
to prove results about knowledge gain in three kinds of systems which we will examine again 
later (in the context of commitment systems). 

Corollary 16 Let M be any model which is either (l) subject to process failures or (2) subject 
to communication failures. For any P C H and wff <p which is local to Q C P, Q-receive- 
dependent, and initially false, Kp<p is P-receive- dependent. 
Proof: We treat the two cases separately; 



^Ciiandy and Misra (1986) showed such a result for systems with asynchronous pro cesses, regardless of failures. 
We use a result which holds for ay stems with failures » regardless of synchro Ry. 
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process failures: By Lemma 11, ^is Q -failure dissociated. Therefore, by the Message Chain 
Theorem, <p requires a message chain Q P, so Kp(f> is P-receive- dependent. 

communication failures; By Lemma 11, is point wise nontiivial in M. Then, by the Mes- 
sage Chain Theorem, 0 requires Q — ^ P. D 

Lemma 17 In any model M which is 

1- subject to permanent communication failures, 

2, weakly terminating and subject to communication failures , or 

3. weakly terminating, subject to process failures and recovery, and ft- transit bounded, 

for any P C H and wfF (f> which is local to Q C P, Q-receive-dependent, and initially false, 
Kp04> is P-receive-dependent. 

Proof: By CoroEary 16, Kp<^ is P-receive- dependent. By Lemmas 13 and 10, ^ = O^. 
Therefore, Kp04> is P-receive- dependent. 0 

6,2 Communication Structure and Message Lower Bound 

We now determine the communication structure, and a lower bound on the number of messages 
required (excluding the contract announcements) to establish commitment, in any negotiated 
commitment protocol. These results are important, because they teU the protocol designer 
that any protocol that supports negotiated commitment must ensure that at least the lower 
bound number of messages passes among processes, according to the determined communica- 
tion structure; further, the propositional content of these messages comes from the knowledge 
requirements. 

Lemma 14 identified some of the knowledge a contractor needs to accept or a manager 
needs to award. We now use the Message Chain Theorem, No Predetermined Bids, No 
Predetermined Dependencies, the pointwise nontriviality of BID^ and y^^-p^DEPEND^^ 
the specification of the primitive propositions, and the failure assumptions for C- systems, to 
derive communication requirements from the knowledge requirements* It is easy to show, for 
systems with process failures or communication failures, thatj for process p to know that con- 
tractor c bid, there must be a message chain from c to p (Lemma 18), and for c to know that 
m selected c as a codependent, there must be a subsequent message chain from m to c (Lemma 
19). 

Lemma 18 For any C-system M, c e C , p G II such that e e E ^ and i g 

if (e, i) \= KpBID^, then there is a BIDc-message chain c p in (e, 0) to (e, i). D 

Lemma 19 For any C-system M, c e C , p G II such that c, and {e,i) 6Pts(f ), 
if {e,i) }= KpV^^^^DEPENI^, then there is a V:^:el?.-DEPEFD^-message chain m ^ p 
in (e,0) to (e,z), 0 
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Therefore, each of KpBIDc and Kp\/xeDc^EPEND^ is p-receive^ dependent. 

From this simple analysis and Lemma 14, we get the underlying communication structure 
of any protocol which supports negotiated commitment: if P C C contractors have accepted, 
then there must have been a message chain from each c ^ P to m and a subsequent message 
chain from m to each c.^ Adapting the linear two-phase protocol for atomic commitment (see 
[Gray79j BeHG87]) to negotiated commitment illustrates that the message chains required may 
overlap; for example, the message chain from the manager to a contractor can include as a 
subchain the message chain from the manager to a contractor earlier in the Unear order. For 
atomic commitment j the analogous result is that, for p to commit, there must be a message 
chain from every other process to p; further, for the others to commit, there must be a message 
chain from p to each other process. Centralized and Unear two- phase commit protocols illustrate 
that the chains may overlap or converge through a single coordinator; the decentralized two- 
phase commit protocol illustrates that the message chains luay be independent. Further, aU of 
the '^flexible" two-phase atomic commitment protocols discussed informally by Burger (1989) 
implicitly respect this underlying communication structure. 

Finally, we can get our lower bound result. First, if some subset P of contractors has 
accepted, then the number of nonnull RECVs is at least twice the number of contractors which 
accepted. Further, if the manager has awarded to some subset P of contractors, then the 
number of noanuU RECV events is at least the number of awarded contractors. Note that the 
proof is couched in terms of the high-level concepts of knowledge and message chains; much of 
the combinatorial detail is hidden under these concepts. 

Theorem 20 For any C-system {a J) ePts(^), P C C, 

1. if (e,/) 1= A^^pACCEPTc, 

then the number of nonnull RECV events in (e,/) is at least 2 j P |. 

2. if (e,/)h A.^pAVKARD^, 

then the number of nonnull RECV events in (e, /) is at least \ P \ , 

Proof: 

1. We note that (e, /) |^ A^^p{K^y^^v,DEPEND^) (by locality of ACCEPT^ and Lemma 
14) and that a V^et?^ DEPE JTD^-message chain exists from m to each c (by Lemma 
19). Furthermore, because M ]= {V^^VcDEPEND%^) D KpBIDc, there must be a BID^- 
message chain c m for each c (by Lemma 18), The W x^Vc^EPEND^-messB^ge chain 



^One could extend this analysis as follows. It is easy to aKow that M |= KpDEPEND^D A^q^KpBID^. 
Therefore, when dependency set i has accepted, for each pair c^d £ KcBWd holds and KdBIDc holds. 
Therefore there must be a BJDc- mess age chain from c \o d and a BID^- mesa age chain from d to c. We do not 
pursue this line of leasoning, because the BID^- mess age chain from c to is ennbedded in the BIDc- message 
chain from c to m concatenated with the DEFEJVPSi- message chain from m to (and syntmetiically from d to 
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from m to any c must strictly follow the BJDc- message chain from that c to m (i.e., there 
are consecutive message chains c m c)^^. Therefore^ each c E P must send a 
message which is received (along c^s BJDc-message chain to m), and each c ^ P must 
receive a nonnull message (along the V^^p^DJSPfi^D^-message chain from m). 

Let P = {ci, Cg, . . . , Cfc}. For each c; 6 -P, call the message it sends on its BIDt^ -message 
chain to m rric^ ^; also call the message it receives on the Vig2>^;D£!P£7Z^D^- message chain 
from m nici^D^ Therefore, in (e, /), there are the following nonnuUj received messages: 

and 

Therefore, there are 2 | P | distinct, nonnull messages received unless, for some distinct 
CiiCj £ P, Uhrj^B — nicj.Di that is, if there are fewer than 2 | P | nonnull messages received, 
then there must be at least one c^jCj pair such that the bid message sent by Ci is the 
dependency message received by Cj. We now show that, for such a c;,Cj pair, either 
(e,/) 1= -iACCEPTcii contradicting the statement of the theorem, or other messages 
must be received in (e, /). 

m^ j p is the message on the Va:G2>cjOBPE]VD^^ message chain from m to Cj which allows 
Cj to attain Kcji^x^v^j DEPEND^). Therefore, receives a nonnull message m on the 
{yj:S^^jDEPEND^ymessB.ge chain from m to Cj. We note that rn ^ nid.D^ because 
receives m before initiating the BID^-message chain from Ci to m (with m^^s), and 
must receive m^ j^ after sending nid.B- Thus, we have an additional nonnull message m 
received, which compensates for the distinct message "lost" by the fact that = nh:i,D* 

Therefore, we conclude that at least 2 | P | nonnull messages are received. 

2. We know that (e,/) \= AcEpKy^BIDc (by Lemma 14). Therefore, there is a BJDc^message 
chain c m in (e, 0) to (e, /) for each c & P. Therefore, each c € P must send a nonnull 
message which is received (along c — ^ m), so there are at least | P | nonnuU messages 
received. 0 

For an atomic commitment, one in which all contractors commit, | P |= n — 1; Theorem 20 
tells us that any execution in which all participants decide to commit requires at least 2{n - 1) 
nonnull messages received. This matches the known result for atomic commitment, given first 
by Dwork and Skeen (1983). To show their lower bound result, Dwork and Skeen (1983) use 
a tightly synchronous computation model with permanent process failures and an argument 
based on the message passing graphs produced by a '*best-case," failure- free instance of an 
atomic commitment protocoL We suggest that the knowledge- theoretic approach yields a more 

^'^This b tiue because, by the definition of a V^^v^DEPEND^-messKge chain, W^^-d^DEPEND?^ must hold 
at the point (e, /) when m sends the first message in the chain. In that case, (e,/) |^ KmBIDc, and thus the 
RECV by which m learns BWc must occur earlier than time f. 
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elegant and intuitive proof and a more generally applicable result (applicable under process 
failures, communication failures, or asynchrony). Hadzilacos (1990) also gives a knowledge- 
theoretic proof of this result for atomic commitment. Although his proof differs significantly 
in approach from ours, Hadssilacos also determines requisite knowledge levels for decision and 
message passing requirements for attaining the required knowledge, from which the lower bound 
foUows. 

7 Impossibility Results 

An impossibility result proves that no protocol can guarantee the behaviour addressed in the 
result; for example, we will soon show that no protocol can guarantee that an undecided process, 
recovering from a failure, can decide consistently without receiving further messages. Impossi- 
bility results save the protocol designer from the futile effort of writing a protocol to support 
the desired behaviour, 

AU of our impossibility proofs have the same form: (i) determine that the desired com- 
mitment behaviour requires arbitrarily deeply nested knowledge in the specific type of system; 
@ determine that establishing that knowledge requires arbitrarily many consecutive message 
chains in any protocol; and (5) argue that the communication is unattainable. This common 
structure demonstrates the power of the knowledge-theoretic approach. 

7.1 Independent Recovery 

Independent recovery is the ability of a process to decide consistently, upon recovery from a 
process failure, without executing any nonnuU receive events. Independent recovery is desirable, 
because it allows failed processes, if they recover, to decide consistently based on local state, 
without blocking and without communicating with others; processes that are not failed can 
ignore failed ones. The lack of independent recovery means that at least one "Bve" process 
must have the knowledge and longevity to assist recovering processes in deciding, regardless of 
how long those processes can remain failed. 

In this section, we show that no negotiated commitment protocol can support independent 
recovery. The proof of this result proceeds as foUows. First, we show that all processes have 
decided at any terminating point of a system which supports independent recovery. Then we 
derive knowledge levels required to establish a commitment (award or accept) in such a system. 
Then we show that certain sequences of consecutive message chains are needed to attain the 
relevant knowledge levels. Finally, we argue that establishing a commitment in a system which 
supports independent recovery requires an infinite sequence of consecutive message chains. 

In particular, we show that, in a C- system which supports independent recovery, ® an 
accepting contractor c must have arbitrarily deeply nested knowledge about the manager's 
knowledge about c's knowledge that m made c a codependent, and (3) an awarding manager must 
have arbitrarily deeply nested knowledge about c's knowledge that m made c a codependent. 
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We show this, in Lemma 25, by induction on the knowledge nesting level. Then the Message 
Chain Theorem allows us to show that the required knowledge cannot be gained in finite time 
(Theorem 27), by showing that arbitrarily many consecutive message chains are needed to gain 
the required knowledge (Lemma 26). In order to show Lemma 25, we must show ® how m's 
award to c depends on c's acceptance knowledge and c's acceptance depends on m's award 
knowledge (Lemma 23), and @ that each of the nested levels of knowledge in Lemma 25 can 
be attained only by message receipt (Lemma 24). Lemma 23 shows that ® if c must know a 
c- receive- dependent proposition 0 in order to accept, then an awarding m must know that c 
knows ^, and @ if m must know some m- receive- dependent proposition 0 in order to award, 
then, in order to accept, c must know that m knows 0. Lemma 23 and Lemma 24 allow us to 
show the interleaved knowledge requirement in Lemma 25, using, as a basis, the fact that an 
accepting c must know that it is a co dependent (Lemma 14), 

Definition 21 A C-system M supports independent recovery if M is subject to process failures 
and recovery, and for aU (e,/) ePts(^^), / > 0, 

• if, force C, FAIL H e(/- l,c) and FAIL 7f e(/, c), 
then either 

©there is g > f such that (e,^) \=ACCEPTc\/REFUSEc 
and RECV(m,p)^e(ff,c)-e(/- l,c) 
for all messages rn / A and p & II\{c}, or 

@ there is a 5 > / such that FAIL H e[g^c) 

and 

• if FAILHe(/- l,77i) and FAIL e(/, m), 
then either 

©there is 5 > / such that {e,g) i= A^^dAWARD^V REJECT^) 

and RECV(rn,p) ^ ^(j, m) - e(/ - 1, m) 

for all messages ru^ X and p £ II\{m}, or 
@ there is a 5 > / such that FAIL H e{g^ m). ^ 

We first prove that, in a C-system subject to process failures and recovery, all processes 
have decided at any terminating point. 

Lemma 22 In a C-system subject to process failures and recovery, if (c,/) is a terminating 
point, then (e, /) |= A^^c(ACCEPTcyREFUSEc) A Ac€c{AWARD^V REJECT^). 

Proof: Assume bwoc that (e,/) is a terminating point, so (e,/) |= Ap^jiTEilMp, but 
{ej) \= -^[A^eci ACCEPTc\fREFVSEc) A A^ed AWARJ^^VREJECT^,)]. 
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Because no process is failed at a terminating point, Failed{ejf) = 0. Without loss of 
generaJityj assume noloss{ej/ -|- 1) ^ 0 (this is without loss of generality because there is 
a finite number of messagesj say j, in transit at (e,/), so one of {e,f + 1), (e,/ + 2), . . 
(cj/ + J + 1) must be lossless). Note that for any si £ noloss{e,/ + 1), Failed{eijf + 1) = 
0 and (eij/ + l) |= Ap^nTEJJMp (i.e., all processes must also be terminated in the loss- 
less points corresponding to (e, / + !)» because TERMp is a stable, local predicate). Also, 
(ei, / + 1) N ^[A^ec{ AGCEPT^V REFUSE^) A A^^ci AWARD^^yREJECT"^)], because ACCEPT^, 
REFUSEcf AWARD^j and REJECT^ is each local to the (terminated) process subscripting it. 

By Post-Failure Termination, there is 62 G noloss(6,/+ l) and A > / + 1 such that 
Faihd{e2^g) — 0 and eg 6 noloss(e3jg)j for / + 1 < 3 < /i, and 
(e2,/i) 1= hc^c{AGCEPT^\/ REFUSE^) A Acec( A WARU^V REJECT^). 

Therefore, at least one p £ II executes an event in (e2,/t) — (€21/) (by the definition of 
knowledge and some process's local knowledge changing), and (e2,/) [= -^TERM^, Because 
(^2? /) ^i^d TERMq is local to for aU g € II, we have that (e, /) |= -1 TERMp, so (e, /) 

is not a terminating point, contradicting our assumption. 0 

Lemma 23 shows how (1) m's award to c depends on c's acceptance knowledge, and © 
c's decision to accept depends on m's awarding knowledge. The first part of Lemma 23 below 
says intuitively that, in order for m to award to c at some point, m must be sure that c has 
received enough information to accept. This is because otherwise c may fail, recover, and need 
to decide without receiving any more messages; then c cannot gain the knowledge it needs to 
accept and must therefore refuse, violating the Decision Harmony property. -^^ The second 
part of Lemma 28 has the corresponding assertion needed for c to accept. 

Lemma 23 In any C-system M which supports independent recovery, for any c € C , 

1, if K^^ is c-receive-dependent and M ]^ACCEPT^-J K^<p, 
then M \=AWARD'^D K^K^^. 

2. \i Kjj^^x^ m-receive- dependent and M \=^AWARD%p Km<^, 
then M ^ACCEPTcD K^K^(f>. 

Proof: We prove the first; the second foUows analogously. We prove this result using a 
series of three support claims (assume M \=^ACCBPTcD Kc<^^ as stated above): 

Claim 1: M |= {AWARB'i^hFAlLED^) D [K^4>y UO FAILED^]. 
(If the manager has awarded to c and c is failed, 
then either c knows 0 or c wiU keep failing (infinitely often),) 

^^UnUke "agreement" problems, in which the post- failure decisions of faulty processes are irrelevant [Had z 8 9], 
commitment problems impose the same consistency requirements on aU pro cesses ^ whether they decide before or 
after failures. This motivates when it awards to c, to know that c has received enough, information to decide 
consistently. 
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Claim 2: M [= {AWARD%,A FAILED^) D K^<f>. 

(If m has awarded to c and c is failed, then c knows (p,) 

Claim 3: M \=AWARD^^D K^^. 

(If m has awarded to c, then c knows (^,) 

Froo/ of claim J: M j= (AWARD^AFAILED^) 3 [if,^ V DOFAJLEDc]. 
(Because AWARD^ remains true in any extension, the only way to prevent a failed 
c from needing to know <j> (in order to accept independently) is by ensuring that c, 
in every extension, fails infinitely often (and, therefore, need not decide)). 

Assume bwoc that there is some (e,/) EPts(5) such that 

(e,/) ^AWARD^AFAILEDcA^Kc<^A^nOFAILED^. Therefore^ there is (ei,^) ex- 
tending (e, /) such that {ei,g) |= -^OFAILEDc. Therefore, there is (e^^h) extending 
(ei.g) such that {e2,h') |= -.FAILED^, for all /i' > /t. Now [62, h) extends (e,/), 
and (62, h) ^AWARD^ (by stability). 

[Find an "earliest" point at which c recovers after (e, /); this is guaranteed to exist, 
by the above,] Let (€3, i) extend (e, /) such that (eajj) ^FAILEDc, for aU / < ; < 

\= -^FAILEDcj and there is no 64 extending (e,/) and f < k < i such that 
(e4,A) 1= -iFAILEDc. Therefore, RECV(m^p) ^ 63(2- 1, c)-e3(/, c), for all messages 
m ^ A and p 6 11 \ {c}. Now, because (f> is c-receive-dependent, (e3,i — 1) j= ^K^^. 
By stability, [e^.i- 1) t=AWAitJ)^. 

(*) Because M supports Nontrivial Process Failure, there is es extending (63, i) 
such that (es, j) |= ^FAllEDc^ for all j > 

(**) Because M supports independent recoveryj there is fc > z such that 
(eB,A) ^ACCEPTcVREi^USE^j and RECV(m,p) ^ £5(^1, c) - e5(i - l.c), for all mes- 
sages m 7^ A and p £ n\{c} (i.e., c receives no message before it decides). Because 
Kc<i> is c-receive-dependent, (es^A;) ^ -i K^tj). Therefore, (es^fc) j^iiEFUSEc, and, 
by stability, (es^fc) j=AWAJiD^AJiEJ'U5Ec, violating Decision Harmony. q 



Proof of claim 2: M ^ (AWARIT^AFAILED^) D K^^. 

In any system which is subject to process failure and recovery (Definition 3), any 
subset of failed processes may recover. Therefore, given any point (e, /) ePts(^) 
such that (e,/) \—FAlLEDcy there is at least one point {e\^g) ePts(^) extending 
(e,/) such that (ei,ff) |= -^FAILEDc^ By Nontrivial Process Failure, (ei,jf) ^ 
-tOEAJLEDc, and because {ei^g) extends (e,/), (e,/) ^OFAILEDc. For any 
point {e2,h) ePts{f) such that (e2,/i) h ""fAILEDc, (e2i/j^) h --OEAJXEDc, by 
Nontrivial Process Failure, Therefore, M |= ^OFAILEB^, soM ^ ^aOFAILEDc, 



§7.1 Indepeadent Recovery 



Therefore, M \= (AWARD^aFAILED^) d [K^((>V nC FAILED^] 

(from claim 1) reduces to 

M 1= (AWARD^AFAILEDc) D Kc4>. 



Proof of claim 3: M \=AWARD^^D K^4>. 

Bwoc, assume there is (e,/) £Pts(f) such that (e,/) \=AWARD^A^Ke(i>. If 
(e, /) \=FAILEDi., then we have a contradiction of claim (2). Assume, instead, 
that (e, /) i= ->FAILEDc. 

Take any ei £ fall(e, /, {c}). (ei, /) \=AWARI^ (because AWARD^ is local to m 
and (ei,/)~„,(e,/)). Further, either (euf) ^FAILED^, or (e^, f) \=TERM^ (by 
the definition of fail). In either case, {&i,f) |= ""^c^ (because Kc4> is c-receive- 
dependent, by hypothesis). 

In the case that (ei, /) \=FAILEDc, we have (ei,/) \=AWARD^AFAILED^A-:K^(f>, 
violating claim (2). In the case that (ei, /) \=TERMc, we have (ei, /) \=ACCEPTc 
(by Lemma 22 and Decision Harmony), but then (ei,/) j= iTc^t a contradiction. 

Therefore, {ej) |= K^cji.i^ 
Therefore, M \^AWARD%,D Kc4), and by the locality of AWARD^ to m and Lemma 8, 

For the purpose of following lemmasj we will call part 1 of Lemma 23 the Award Knowledge 
Rule 23 and part 2 the Accept Knowledge Rule 23* 

Henceforth, we abbreviate V^£i>^ DEPEND^ by DEPEND^. Lemma 24 shows that ® c 
must receive a message for the knowledge level {KcKj^)^ KcDEPEND^^'^^ to hold; and(2)m 
must receive a message for [KrjiKcYDEPEND^^ to hold. 

Lemma 24 In a C-system M subject to process failures, 

• {KcKmY KcDEPENiy^ is c-receive-dependent and initially false, for all j > 0; and 

• {KjnKcY DEPEND^ is m- receive- dependent and initially false j for all i > L 

Proof: We prove the first by induction on the knowledge nesting level j; the second follows 
similarly by induction on the knowledge nesting level i. 

^^Foi- any p,q e H, j > 0, we abbreiriate K^K^ KpK^ . . . Kj,K^ tf^ by (KpR^ytp. Sim- 

1 2 j 

ilarly, we abbreviate KpOK<jO KpOK^O . . . KpOK^O <l> by {KpOK^Oytt>. FinaUy, we abbreviate 

^ V ^ V ^ ^ -V ^ 

1 2 J 
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Base case; j - 0. We claim that KcDEPEND^ is c-receive- dependent. This holds from the 
requirement of a DEPEND^ -message chain (Lemma 19), KcDEPEND^ is initially false 
because DEPEND^ is so (by No Predetermined Dependencies). 

Inductive step: j > 0. Assume the inductive hypothesis holds for j - 1. 

Therefore, (KcKm)^'^ KcDEPEND^ is c- receive- dependent and initially false. We now 
claim KcK^{K^Kmy~^ K^DEPENI^ is c-receive-dependent. By Lemma 11, introspec- 
tion, and the Message Chain Theorem, KmiKcK^y^^KcDEPEND%^ is m-receive-dependent. 
Further, Kn,{K^Kmy-^K^DEPEND^ is initially false, because DEPEND^ is (by No 
Predetermined Dependencies). Therefore, by Lemma 11, introspection, and the 
Message Chain Theorem, KcKmiKcKm^-'^KcDEPEND^^ is c-receive- dependent; further, 
K^Km{KcKmy-^K^DEPEND%, is initiaUy false, because DEPEND^ is (by No Prede- 
termined Dependencies), C 

Lemma 25 shows that ® {KcKj^yKcDEPENDl^must hold for arbitrarily deep nesting in 
order for c to accept, and @ {K^KcYDEPEND^ must hold for arbitrarily deep nesting in 
order for m to award to c. 

Lemma 25 For any C- system M supporting independent recovery, 

1. M ^ACCEPTcD {KcK^yKcDEPENDZ,, for any c e C and for all j > 0. 

2. M \=AWARir^D (Rrr^K^Y DEPEND^, for any c e C and for all i > L 

Proof: We show the first; the second foUows immediately. We prove this by induction on j. 

Base case: j ^ 0- The claim that M ^ACCEPTcD K^DEPEND^ holds by locality of A CCEPT^ 
and Lemma 14, 

Inductive hypothesis: j > 0, Assume the inductive hypothesis holds for j - 1. Therefore, 
M ^ACCEPTcD {K^Krr,y^^KcDEPEND^. We now claim that M ^ACCEPTcD 
KcKjj^{KcKmy^^KcDEPEND^. This holds immediately from (a) an application of the 
Accept Knowledge Rule 23 on (KcKrj^y-^K^DEPEND^ (which is c-receive-dependent 
by Lemma 24) j and (b) an application of the Award Knowledge Rule 23 on the result of 
application (a) (which is m- receive- dependent, also by Lemma 24). C 

Lemma 26 establishes that consecutive message chains are required to establish each knowl- 
edge level in Lemma 25. 

Lemma 28 In a C^system M subject to process failures, 

• for aU J > 0, if, for some (e, /) GPts(f ) and c G C , (e, /) ^ {K^K^^ KcBEPENJy^, 
then there is a sequence of consecutive message chains m c(— m c)J in (e, 0) to 
(e,/); and 
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• for aU i > 0, if, for some (e, /) 6Pts(£) and c e C , (e, /) |= (Kr^K^YDEPEND^, 
then there is a sequence of consecutive message chains c — m(— c m)' in (e,0) to 
(e,/). 

Proof: We prove the first by induction on the length of the sequence of chains, j; the second 
follows analogously by induction on 

Base case: j = 0, The claim is that, for any (ej) ePts(f), if (e, /) |= K^DEPEND^^, then 
there is a DEPEND^- message chain from m to c in interval {e,0) to (e,/). This holds 
by Lemma 19. 

Inductive step: j > 0- Assume the inductive hypothesis holds for j - 1, Now we claim 
that KcKjn^KcKmY'^ KcDEPEND^ requires a sequence of message chains m c(^^ 
m i cy. That is, if (e, /) |= KcK^iK^Kmy-^KcDEPEND^, then there is a se- 
quence of consecutive message chains m — c(-^ m c)^ in interval {e, 0) to 
(Cj /). From the inductive hypothesis, we can assert the existence of the chain sequence 
m ^ c(i m c)J-S required to establish {KcK^y-^KcDEPEND^. By Lemma 
24, {K^KmY-^ KcDEPEND^ is c-receive- dependent in M. Therefore, by Lemma 11, 
{KaKrr.y-^K^DEPEND%^ is c-failure-dissociated in M. By Lemma 24, [KcKmY-^ DEPEND^ 
is initially false. By introspection, {KcK^j^Y'^ KcDEPEND^ is local to c. Therefore, by 
the Message Chain Theorem, there is a {KcK^y~^ KcDEPEND^-messB^ge chain c m 
in (e, 0) to (e, /), to establish Km{KcKmy~^ KcDEPEND%^, Now this chain must strictly 
follow m -U c{-U m c)J-\ because {KcK^y-'^KcDEPEND^, must hold at the start 
of the new c m, and {KcKjny~'^KcDEPEND^ cannot hold any earlier than the end 
of m — c(— t-^ m — t-*- c)^"^. From this, we conclude the existence of m ^{^-^ ^ 
c)J-i m. By similar reasoning, there is a Km{KcKmy~'^KcDEPEND^-me33B>ge 
chain, to establish KcKjn{KcKjny^^ KcDEPEND^, We can conclude the existence of 
m cj(-t^ m cy~^ m c, or m <^{-^ — ^ c)-^, O 

Theorem 27 There is no C- system which supports independent recovery. 

Proof: Bwoc, fix any C-system M which supports independent recovery. From Lemmas 25 
and 26 J we may conclude that, in order for c to decide to accept, there must be an infinitely 
long sequence of message chains from c to m and back in M- If there is some (e^/) GPts{f ) 
such that (e,/) ^ACCEPTc^ the largest possible consecutive message chain length is i, so 
at most there exists m *^(— ^ c)^~^ in (e,0) to (e,/). By Lemma 25) however, 

m c(^t^ m c)J also exists in (e,0) to (e, /) for aU j > | - 1. This is a contradiction* 
Essentially, the required infinitely long sequence of message chains cannot occur in a finite 
portion of execution e. Therefore, c may never accept in 

Accept decisions must be possible in Therefore^ M is not a C-system, 0 
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7.2 Weak Termination, Process Recovery, and Bounded Time Communica- 
tion 

The result we give below states that there is no protocol for negotiated commitment which 
guarantees weak termination in a system in which processes may fail and recover and in which 
messages may spend a bounded amount of time in transit. If such protocols existed, then 
a decided process could stop executing events (terminate) regardless of the state of other, 
failed, processes. As with independent recovery, this impossibility result means that, in any 
negotiated commitment protocol, at least one "live" process must have the knowledge and 
longevity to communicate with recovering processes to help them decide, regardless of how long 
those processes remain failed. The proof of this result uses lemmas similar to those in the proof 
of the impossibility of independent recovery. 

The first part of Lemma 28 below says intuitively that, in order for m to award to c at 
some point, m must be sure that c has received enough information to accept. This is because 
otherwise the system may terminate without any more messages being received (by Theorem 
12), so that c cannot gain the knowledge it needs to accept and must therefore refuse {by 
Lemma 22), violating the Decision Harmony property. The second part of Lemma 28 has 
the corresponding assertion needed for c to accept* 

Lemma 28 In any weakly terminating C-system M which is A-transit bounded and subject 
to process failure and recovery, for any c € C , 

1. if Kc^ is c-receive-dependent and M \=ACCEPTcD Kc<f>j 
then M \=AWAR]y^D K^K^4>, 

2. if KTn4^ is m-receive-dependent and M )=AWAfiD^D K^<^^ 
then M ^ACCEPT r;) K^K^4>, 

Proof: We prove the first; the second follows analogously. We use the following claim: 

M \=AWARD^^D Kc^, 

Proof of claim: Assume bwoc that there is (e, /) ePts(^) such that (e, /) \=AWARD^A-yKc<f>. 
Therefore, (e,/) ^ -^ACGEPTc^ By Theorem 12, there is (ei.h) ePts(f) which 
is a terminating extension of (e,/) such that no process receives a nonnull message 
after (ei,/). By Lemma 22, (ei,/t) \^ACCEPTcVREFUSEc, and by if^^ being c- 
receive-dependent, (ei,/i) \=REFUSE^. Therefore, (ei,/i) ^AWARD^AREFUSE^, 
violating Decision Harmonyt^|gjjj^, 

Because AWARD^ is local to m, whenever AWARD%^ holds in a particular point, it holds 
in all points which m considers similar- Therefore, Kc^ also holds in those points, so 
M ^AWARLT^D K^K^(f>, 0 

For the purpose of the next lemma, we will call part 1 of Lemma 28 the Award Knowledge 
Rule 28 and part 2 the Accept Knowledge Rule 28. Lemma 29 shows that ® {K^KjjC)^K^DEPENJy^ 
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must hold for arbitrarily deep nesting in order for c to accept, and @ [RmKcY DEPEND^ must 
hold for arbitrarily deep nesting in order for m to award to c. 

Lemma 29 For any weakly terminating C- system M which is Ar- transit bounded and subject 
to process failures and recovery, 

L M \=ACCEPT^D {KcKr^yX^DEPEND^, for any c 6 C and for aU j > 0; 

2. M |=AWAai43 (Kn^KcYDEPEND^,, for any c e C and for all i > 1; 

Proof: This proof is the same as that for Lemma 25, except that the Award Knowledge 
Rule 28 and the Accept Knowledge Rule 28 are used here, 0 

Prom the fact that commitment between c and m requires infinitely long sequences of 
message chains between c and m and the fact that commitment must be possible in any C- 
system, we conclude our impossibility result. 

Theorem 30 There is no weakly terminating C-system which is A:-transit bounded and subject 
to process failures and recovery, □ 

There is a strong connection between a system supporting independent recovery and a sys- 
tem being weakly terminating, A- transit-bounded, and subject to process failures and recovery* 
The reader may have noticed the great similarities between Lemmas 23 and 28, Lemmas 25 and 
29, and Theorems 27 and 30 (respectively). This strong connection is related to the possibility 
in each system of deciding independently; see Appendix B. 

The analogue of Theorem 30 for atomic commitment tells us the following: if a round- 
based atomic commitment protocol is resilient to process failures and recovery and such that 
a message may be received only in the round in which it is sent^ then the protocol may run 
forever. 

Prom the knowledge levels in Lemma 29, one might think that one can prove Theorem 30 
using common knowledge [HaMo90]. Indeed, another way to prove this theorem would be to 
show that commitment in the given systems requires common knowledge among {x} and m of 
DEPEND^j which, as a direct corollary of the Message Chain Theorem, is impossible. The 
current results on attaining common knowledge do not address systems with process failures 
(although this extension should not be difficult). The Message Chain Theorem allows us to 
reason about both finite and infinite knowledge levels in several kinds of systems, including sys- 
tems in which one cannot attain common knowledge. Further, we caji reason about incremental 
gains in a process's knowledge through communication, FinaHy, the Message Chain Theorem 
applies to a broad class of problems. 

We note that commitment under other system assumptions may require a variant of common 
knowledge of some proposition; for example, [HadzSS] argues the analogue of Theorem 38 for 
nonblocking atomic commitment by showing (i) the need for eventual common knowledge of a 
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particular proposition j and ® the impossibility of attaining that common knowledge in the given 
systems. Theorem 38 \ises our proof technique using nested knowledge levels and consecutive 
message chains* 

7,3 Nonblocking Behaviour, Termination, and Communication Failures 

Informallyj we say that a process is feZacAied when it must await the repair of failures before pro- 
ceeding [Skee82, SkSt83, BeHG87], Blocking is undesirable^ because it may cause participants 
to wait for an arbitrarily long time before deciding, making a contract undecided for arbitrarily 
long, uselessly holding any resources which might be required for commitment. Therefore, non- 
blocking commitment systems are preferred over blocking ones. We will show some conditions 
under which nonblocking behaviour is impossible to achieve. 

Definition 31 A C-system M is called nonhlockingii 

M 1= AcecO(i^AH£?D^vACCEPTcVJiEJ^[7SEc)A 

That is, in a nonblocking C-system, all nonfailed processes eventually decide one way or the 
other. Notice that this covers a situation in which a process fails, recovers, and does not fail 
again— such a process must decide. 

We show here that no negotiated commitment protocol can achieve nonblocking behaviour 
if the system is subject to permanent communication failures (shown in [Skee82, SkSt83] for 
atomic commitment) or the system is weakly terminating and subject to (even transient) com- 
munication failures (shown in [Hadz90] for atomic commitment). CaU a system which has either 
of these two properties a target system. The development of this result proceeds in essentially 
two parts. First, we demonstrate that we can derive from any target C-system a process-failure- 
free counterpart target C-system, and then we derive knowledge levels required to establish a 
commitment (award or accept) in a nonblocking^ process-failure-free C-system. Second, we 
show that certain sequences of consecutive message chains are required in order to attain cer- 
tain relevant knowledge levels in any target C-system. Then, to bring the two parts together, 
we argue that, in order to establish a commitment in a nonblocking target C-system, an infinite 
sequence of consecutive message chains is required. Lemmas 32-34 constitute the first part of 
the development. Lemmas 35-37 constitute the second part^ and Theorem 38 establishes the 
overall result. 

Lemma 32 shows that, from any target C-system, we can derive another target C-system 
whose executions are process-failure-free and a subset of the executions of the original system, 
and whose points support primitive propositions iff they did in the original system. 

Lemma 32 Given any C-system M = (f there is a process-failure- free C-system Mpff = 
{SpFFi^PFp) such that 

L £pFF Q£ i 
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2. for any (e, /) G Pts{£pfp) and primitive proposition ^ 
(Mpjrjr, e, /) h 0 iff {M,eJ)^^ymd 

3, Mp^iT h ^FAILEDp, for aU p g n, 

Ptoo/: We take Spff - {e \e e £ and FAIL ^ e(/,p), for all / e IV and p G II}, Now 
£pFP C£ . Further, ^pff is a system. For any <^ we take IpFF{<f>) = ^CflS^) H Pts(f pi;^^) 
(that is, any such primitive proposition holds in those points of £ (in which it held) which are 
now in £pFp]. Now it is straightforward to verify that f is a system and that all specifications 
hold in Mpjrp, because they hold in M. Q 

In any process-failure-free C- system Mp^Fi the nonblocking property becomes Mppp ^ 
Ac^cO{ACCEPTcV REFUSE^) A Ac^cO{AWARD^^y REJECT^). Lemma 32 implies that it is 
enough J for our impossibility result, to show that there is no process-failure- free nonblocking 
target C-system. 

Now we identify some knowledge states required for decision in any process-failure-free 
nonblocking C-system.-^'^ Lemma 33 shows how ® c's decision to accept depends on m's 
awarding knowledge; and @ m's award to c depends on c's acceptance knowledge/ 

Lemma 33 For any process-failure-free, nonblocking C-system Mpff, c E C , and wff 

1, if Mppp \=AWARD'^D Km^, then Mpff \=ACCEPTcD KcOKrr,(f>] and 

2. if Mppp \^ACCEP%D K^<j>, then Mpff hAWAED^D KmOK^4>. 

Proof: We prove the first; the second follows analogously. Assume bwoc that there is 
some MpFF such that Mpff \^AWARD^D K^<f>, but Mpff ^ACCEPTcD K^OKm<P^ Then 
there is at least one {e,f) £ Pts(fpFF) such that (ej) \=ACCEPTcA-^KcOK^<^, There- 
fore, there is (ei,ff)^c(e5 /) such that (ei^g) \=ACCEPTcA-^0K^4^. Therefore, there is 62 
extending {ei^g) such that (e2,/i) H ""^m^j for aU fe > g. Therefore, by the antecedent, 
{e2,h) \= -.AWAftD^, for all /i > g. Therefore, for some i > g, (€2,1) \=REJECT%, (because 
the system is process-failure-free and nonblocking). But (ei^g) \=ACCEPTci and ACCBPTc is 
stable^ so (63 ji) \=ACCEPTcAREJECT^^ violating Decision Harmony. 0 

For use in the next lemma, we call part 1 of Lemma 33 the Award Knowledge Rule 33, and 
we call part 2 the Accept Knowledge Rule 33. Lemma 34 shows that, in a process-failure-free, 
nonblocking C-system, in order to accept, c must have arbitrarily interleaved knowledge about 
m's knowledge of c's knowledge of the dependency* Similarly, to award to c, m must have 
arbitrarily interleaved knowledge of c*s knowledge of the dependency. 



''In Section 8^ we develop the knowledge requirements m gen ero/ nonblocking C-Byatema. Nate that Hadailacos 
(1990) assumes piocess^failure-fiee systems Iq proving his results on the im possibility of nonblockiiig atomic 
commitment protocob; he does not give general knowledge requirements. 
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Lemma 34 For any process-failure- free, nonblocking C- system Mp^rj? and c £ C , 

• MpFF ^ACCEPT^D {K^OKmOyK^DEPEND%,, for ail j > 0. 

• MpFF Kn,0{K,OK^OyK,DEPEND^, 

for any c 6 C and for all i > 1. 

Proof: We show the first; the second follows similarly. We prove this by induction on j. 

Base case: j = 0- The claim that Mp^F \= ACCEPTED K^DEPEND^^ follows from the lo- 
cality of ACCEPTc and Lemma 14. 

Inductive hypothesis: j > 0, Assume the inductive hypothesis holds for i - 1- Therefore, 
MpFF \= ACCEPTED {K^OKrr.Oy-'^K^DEPEND^. We now claim that 
MpFF hACCEPT^D K^OKrnO[K^OK^Oy-^K,DEPEND^. This holds immediately 
from (a) an application of the Accept Knowledge Rule 33 on 

{KcOKmOy-^KcDEPEND^, and (b) an appHcation of the Award Knowledge Rule 33 
on the result of application (a). 0 

From the existence of a dependency message chain (Lemma 19), c must receive a message 
to come to know DEPEND^. 

Lemma 35 In any C- system M subject to communication failures, for c e C , wfF KcDEPEND"!^ 
is c-receive-dependent, Q 

Lemma 36 shows that c must receive a message to attain certain useful levels of interleaved 
knowledge, namely, {K^OK^nOyK^DEPEND^ and m must receive a message for 
Krr,0{K,OKmOYK^DEPEND^ to hold. 

Lemma 36 In any target C- system M, 

1. {KcOKmOyK^DEPEND^ is c-receive-dependent and initially false, for aU j > 0. 

2. KmO{KcOK^OYKcDEPEND^ is m-receive-dependent and initially false, for all i > 1. 

Proof: We prove the first, by induction on the knowledge nesting level j (the second follows 
analogously by induction on z). 

Base case: j - 0, That K^ODEPEND^^k c^receive- dependent is shown in Lemma 35. KaDEPEND^ 
is initially false because DEPEND^^ is so (by No Predetermined Dependencies) » 

Inductive step: j > 0. Assume the inductive hypothesis holds for j - 1. Therefore, 

(KcOKrji^y~^ KcDEPEND^ is c-receive-dependent and initially false. We now claim 
that the same holds for 

KcOKr^O{K,OKrr,Oy-^KcDEPEND^. [K^OKmOy-^K^DEPEND^ is local to c, by in- 
trospection. Therefore J by hypothesis and Lemma 17, 
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KrnO{Kc<>KrjiOy'^^ KcDEPEND%^ is m-receive-dependent. By hypothesis, 
{Kt^OKm^Y'^ KcDEPEND^^ is c~receive- dependent. Therefore, by Lemma 13 and Lemma 
10, 0{KcOKn,Oy-'^K^DEPEND^= {K^OK^Oy-'^K^DEPEND%,, which by hypothesis 
is initially false. Therefore, KmO{KcOKntOy^'^ K^DEPEND^ is initially false. 

KmO{KcOKn,Oy~^K^DEPEND^ is also local to m. Therefore, by Lemma 17, 
KcOKmO{KcOK^<>y-^KcDEPENi:^ is c-receive-dependent. Farther, by m-receive- 
dependency, Lemma 13, and Lemma 10, 

OK^O{KcOKmOy-^K^DEPEND^= KmOiK^OKmOy-^K^DEPEND^, which is ini- 
tially false. Therefore, K^ORrnOiK^OK^j^Oy-^K^DEPEND^ is initiaUy false. D 

Lemma 37 characterizes the sequence of consecutive message chains required to establish 
the knowledge levels in Lemma 36. 

Lemma 37 In any target C-system M, 

L for all J > 0, if, for some (e, /) ePts{£) and c G C , (ej) \= (KcOKmOyK^DEPEND^, 
then there is a sequence of consecutive message chains m c{^^ m c)^ in {e, 0) to 

2. foraUi > 0, if, for some (e, /) ePts(f ) andc ^ C , (e,/) |^ K^O{KcOK^OyK^DEPENiy^, 
then there is a sequence of consecutive message chains m c(-^ m — 1+ c)' — m in 
(e,0) to {ejy 

Proof: We prove the first, by induction on the length of the sequence of chains, j (the 
second foUows analogously by induction on i). 

Base case: j - 0. The claim is that, for any (e,/) GPts(f), if (e,/) \= KJ)EPEND%^, then 
there is a DEP SiVjD^-message chain from m to c in interval (€,0) to (e,/). This holds 
by Lemma 19, 

Inductive step: _j" > 0, Assume the inductive hypothesis holds for j — 1. Now we claim that 
KcOKra^iKc^K^Oy^^KcDEPENI)^ requires a sequence of message chains 
m ^ m i c)J. That is, if (e, /) ^ KcOKrr,<>{K^OK,rxOy"^K^DEPENU^, then 

there is a sequence of consecutive message chains 

m ^ 771 ^ cy in interval (e, 0) to (e,/). Prom the inductive hypothesis, we can 

assert the existence of the chain sequence m c(-^ m c)^"^, required to establish 
[KcOKmOy-^KcBEPENir^ in (e,0) to (e,/). 

Consider KraO{KcOKrr,Oy-^KcDEPEIiJy^. By introspection, {K^OK,^Oy-^K^BEPEIiiy'^ 
is local to c. By Lemma 36, {KcOKmOy~^KcP>EPENDl^ is c-receive-dependent in M. 
Therefore, by Lemma 13, {K^OK^Oy-^K^DEPENiy^ is nontriviaJ in M, Therefore, 
by Lemma 10, M ^ {K^OK^Oy^^K^BEPENlT^^ 0{K^OK^Oy-^K^DEPEND%,. By 
Lemma 36, {KcOKmOy-^KcDEPEND^ is initiaUy fcdse in M. Therefore, 
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0{KcOKmOy^^K^DEPEND^ is initially faJse in M, nontrivial in M, and local to c. 
Therefore, by the Message Chain Theorem, a 

{KcOKm^y~^ KcDEPEND%^-messdige chain c — ^ m is required in (€,0) to (e, /) to es- 
tabhsh KmO{KcOK^Oy-^K,DEPEND^. 

Now this new chain must strictly follow m m c)^""^, because 

[KcOK^Oy-'^KcDEPEND^ must hold at the start of the new c ^ m, and 
{Kc<>Kj^Oy^'^KcDEPEND^ cannot hold any earlier than the end of m ^ c(-iK m -^^ 
cy ~^. Prom this, we conclude the existence of m c(-^ m c)^~^ m. By sim- 
ilcLT reasoningj there is a KmO{KcOKmOy~^ K(^DEPEND^-messa.ge chain^ to establish 
KcOKmO{KaOKmOy-^K^DEPEND^. We can conclude the existence of m J^-^ c{-t+ 
m cy~^ m c, or m m c)^, D 

Theorem 38 There is no nonblocking C-system M which is either subject to permanent com- 
munication failures or weakly terminating and subject to communication failures. 

Proof: Bwocj fix any nonblocking target C-system M. By Lemma 32, there is a process- 
failure-free, nonblocking target C-system Mpff- From Lemmas 34 and 37, we may conclude 
that, in order for c to decide to accept, there must be an infinitely long sequence of message 
chains from c to m and back in Mpirp. That is, if there is some (e,/) GPts(^) such that 
(e, /) \=ACCEPTci then there must be such an infinitely long sequence , which cannot occur in 
a finite portion of execution e. Therefore, c may never accept in Mpff- 

Accept decisions must be possible in Mpff- Therefore, Mpff is not a C-system, a con- 
tradiction. 0 

8 Knowledge Levels in Nonblocking Commitment Systems 

In Section 7.3s derived knowledge levels for process-failure-free, nonblocking commitment 
systems. We now determine knowledge levels required for decision in general nonblocking 
systems, which may admit process failures. As one might expect, the knowledge levels are more 
complex in these systems than in those of Section 7*3. 

We have shown (in Lemma 14) that a contractor c which has accepted knows it is a 
codependent; now it is also true that c also knows that eventually everyone in the depen- 
dency set will fail or know that it is a codependent. For P C 11, we abbreviate by Fp<j> 
the formula Ap^p{Kp4>\/FAILEDpy, that is, every process in P knows (p or is failed [Hadz87]. 
Now, for each x £ V^, M ^ [ACGEPT^ADEPEND^,] D KcOF^DEPEND^. Further, an 
accepting contractor must know that eventually each codependent will fail or will eventu- 
ally know that each codependent wiU fail or will eventually know that it is a codependent: 
M \= [ACGEPT^ADEPEND^] D KcO[F^O[F^DEPEND^]];^nd so on. More generaUy, the 
following holds, for all i € N: 



M h [ACCEPT,ADEPEND%,] D K^<Ti 



(i), for i > 1 
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where ctq =DEPBND^, and a,^i = OF^a,, for i > 0- 

Now we show that each member of this collection is valid in a nonblocking C-system. 

Lemma 39 For any nonblocking C-system M, c £ C , a; 6 1?c, 
M \= [ACCEPT^ADEPEND^] D K^a,, for aO i > 1. 
Proof \ By induction on i. 

Base case: ^ = L The claim is M h [ACCEPT ^ABEPENTT^] D if^OF^DEPEND^, for aU 
c^C yX ^ Vc^ Assume bwoc that there is (e, /) £Pis{£) such that 
(e,/) ]=ACCEPTcADEPEND^A^KcOF^DEPEND%^. Then there is {ti,g)^^{ej) such 
that {et,g) |= ^OF^DEPEND^AACCEPTc^ Therefore, in some execution €2 extending 
(ei,jf), for some d ^ x, (63, A) h -^{FAILEDdVKdDEPEND'^), for all ft > g. There- 
fore, there is some i> g such that (e2,i) ^REFUSEd (because the system is nonblock- 
ingj and M supports Dependent Acceptance and Exclusive Dependencies), but 
(62, i) \=REFUSBdAACCEPT^, violating Decision Harmony, 

Inductive step; Assume the hypothesis is true for i- 1, so M ^ [ACCEPTcADEPEND^] D 
ifc<7i-i; we show it for i. The claim is M h [ACCEPTcADEPEND^^] D K^ai, or M |= 
[ACCEPTcADEPENDl^] 3 ir^OF^a^-i, or M \=AGCEPT^ADEPEND^D K^O Ad^x 
{KdiTi^^vFAILEDd). 
Bwoc, assume for some (e, /) that 

(ej) \=ACCEPT^ADEPEND^A^K^O Ad^^ {Kd(7i_i\/FAILEDd)^ Then there is 
{ei,?)-c(e,/) such that (eug) ^ ACCEPT^ ADEPEND^A^O Adex (ifja^.i Vi^AILED^). 
Therefore, without loss of generality, for every h > g^ 

(euh) \^ACCEPTcADEPEND^A^Adex (ifdc^t-i VFAJLBD^). There is thus ^d€x such 
that (ei,/i) \=ACCEPTcADEPEND^A^{Kd^i.iVFAILEDd) for aU ft > g. Because this 
is a nonblocking system, d must decide in ei \ in order not to violate Decision Harmony j 
it must accept. Therefore, there is i > 5 such that 

(ei,i) \=ACCEPT,ADEPEND^AACCEPTdA^Kd(Ti^i^ violating the inductive hypothe- 
sis, 0 

Nonblocking C- systems have a special interprocess knowledge property, illustrated above, 
which we can characterize as follows. Essentially, a deciding process p must know that eventually 
any process which must decide consistently with p must eventually fail or know enough to decide! 
In particular (assume we have a specific x €Vc), 

L if an accepting contractor must know (f>^ then the awarding manager must know that 
eventually each codependent will know <p or wiU faiL 
if M h Ac€£:{ACCEPTcD Kc<f>) 
then M 1= [AWARD^^ADEPEND^] 3 i^xx^OF^^, 

2. if an accepting contractor c must know ^, then c must know that eventually each code- 
pendent will fail or know <l>. 
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if M j= AceciACCEPTcD ifcfli) 

theE M [= [ACCEPT^ADEPEND^^] D JTcOF^^. 

3, if a deciding manager must know <f> to award to c, then an accepting c must know that 
eventually the manager will fail or know <j>. 

then M \=ACCEPT,D K,OF^^y<i>. 

For example, let ^ =DEPEN]y^\ item 2 yields the base case of Lemma 39. 
For rejection and refusal, we have (assume we have a specific x & V^) 

4. if M h Acec{REFUSE^D K^^) 

then M \^ [REJEC'T^ADEPEND^] D K^OY^<j>. 

5. if M 1= Ac^ciREFUSE^D K^^) 

then M h [REFJJSE^ADEPENBl,] D K^OF^(l>. 

6, if M 1=^ {REJEC'T^D K^4>) 

then M ^REFVSE^D ir,OF{^}^. 

We call items 1 through 6 the Rule of Codependent Knowledge Necessitation. Lemmas 33 and 
34 prescribe ACCEPT^ and AWARD^ knowledge levels for process-failure-free nonblocking 
C-systems; those results use special cases of the Rule of Codependent Knowledge Necessitation, 

The need to achieve the knowledge levels required by the Rule of Codependent Knowledge 
Necessitation seems to provide a daunting challenge to the protocol designer. After all, the 
need to achieve the similar knowledge levels of Lemma 34 led to the nonblocking impossibility 
results of Theorem 38, yet the three-phase atomic commitment protocol of Skeen (1982) is 
nonblocking in systems free of communication failures. We will now discuss informally how 
one might achieve these knowledge levels in a protocol. (That a particular protocol actually 
achieves these levels must be proved formally with respect to that protocol, ) Assume that 
communication is failure- free and that a process g will receive any message sent to it (as long as 
q does not fail), Then process p knows j upon sending message m, that q will eventually receive 
rn (as long as q does not fail). 

Consider now that m has selected dependency set x. If m sends a "DEPEND^^^ message to 
each member of ac announcing the dependency setj then^ under our assumptions, each member 
of X will (either fail or) receive the message and therefore know the dependency set* After send- 
ing all | x | messages, then, K^Ol^xDEPEND^ holds. For AWARD^^ to hold for each c e x, 
KjnOF :::<>¥ ^DEPEND^ must hold (by the Rule of Codependent Knowledge Necessitation), K 
our protocol is so designed that m will now send a ^^KmOTj:DEPEND^" message to each mem- 
ber of X, then (assuming for the moment that m does not fail) JTmC^FajOFiDEPEJVI^ holds, 
even before sending the messages. Now, for AWARD^ to hold, Kjj^OF^OF^^yOF^DEPEND^ 
must aJso hold. Note, however, that when each c G x receives ^^KjnOF^DEPEND^'\ then 
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KaKjnOFxDEPEND%^ holds. Therefore, KmOF^OF^^yOT:^DEPENiy^ holds at the same 
time that Kn^OF^OF^DEPEND^ holds, which is before m even sends ''KmOF^DEPENU^'' 
(1). Continuing along this line of reasoning, we can conclude that m may achieve its AWARD^ 
knowledge by ftrst sending the | a; | "DEPEND^" messages and then being committed by the 
protocol to send the | x j KjjiOF^DEPEND^'^^ messages (unless m fails). 

Let us consider how c gains its accepting knowledge, KcDEPEND^ holds when c receives 
''DEPEND^" from m. For ACCEPT^ to hold, the Rule of Godependent Knowledge Necessita- 
tion tells us, KcOF^rnyDEPENL^ must hold; the stronger K^KmDEPENI^ holds, however, 
when K^DEPENI^ does. Upon receiving "K^OF^DEPEND^'\ the required knowledge levels 
KaOF^DEPEND^ and ifcOF^^jOF^^DEPEiVEF both hold. Based on our discussion above 
{see (1)), KcOF ^^yOFy^OF ^DEPEND^ also holds. Again, continuing this line of reasoning , 
we may conclude that c achieves its ACCEPTc knowledge by receiving ^^DEPEND^^^ and then 
''KmOF^DEPEND^'\ 

Suppose m instead fails before sending all ''Kj^DBPEND^'' messages. Then the termination 
technique used by the "live" processes must guarantee that each "live" member of x will receive 
"OFajDEPEJVD^". IS the termination method is such, then again the informal reasoning above 
suggests that the requisite knowledge levels are attained. 

The point is that, if our protocol is designed properly, then the processes can achieve the 
required infinitely nested knowledge levels in a small number of messages. This is the idea 
used in the nonblockingj three-phase atomic commitment protocol [Skee82j BeHG87j, in which 
a P RECOMMIT message serves as a "DEPEND^'^ message, and the COMMIT message serves 
as "if^OF^DEPEND^". 

In a discussion of atomic commitment (under permanent process failures), Dwork and Skeen 
(1983) give a nonblocking protocol which uses 2(72— 1) nonnull messages (excluding the con- 
tract announcements) in the failure* free case, which number they claim as a lower bound for 
nonblocking atomic commitment. Based on our discussion above^ one might expect a protocol 
to require at least 3{n — 1) messages. Dwork and Skeen achieve their lower bound in a strictly 
synchronous protocol which avoids [n-l) messages (the set of messages that one might use, as 
suggested in the preceding paragraphs, to establish K^OFcDEPENIJ^^ for all c 6 C ) by asso- 
ciating with the achievement of ^f^OFcPEPEJ^D^ (actually KcTcDEPBND^) a null message 
receipt in a particular step in the protocol, for each c e C In other words, the synchrony and 
absence of messages yields the required knowledge states. The protocol as presented is nonin- 
tuitive and difficult to understand. We found the protocol much easier to understand once we 
associated the requisite levels of knowledge with each process at the appropriate step. [Maze89] 
casts atomic commitment as negotiated commitment and shows how knowledge-theoretic results 
for nonblocking commitment behaviour correspond to results of Skeen (1982). 
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We specified negotiated commitment systems and derived results on properties of these systems, 
using a knowledge- theoretic approach* We determined both® what knowledge states a process 
needs to commit to an outcome and® how to attain the required knowledge, that is, system be- 
haviour, in terms of message passing and local computation, required to attain these knowledge 
states. Using the knowledge results and the Message Chain Theorem, we gave several impos- 
sibility results: a message lower bound j impossibility of independent recovery, impossibility of 
termination under certain conditions, and impossibility of nonblocking behaviour under certain 
conditions. The result showing the impossibihty of a weakly terminating commitment system 
under process failures and recovery and bounded communication time is new; the other impos- 
sibility results used some new intermediate results, extended some previously known result Sj 
and further demonstrated the utility of reasoning about knowledge in analyzing distributed 
problems. The impossibility results have a common form: first, determine the knowledge re* 
quired for commitment in the given setting; secondj show that certain kinds of message chains 
are required to achieve that knowledge; and thirds show that the message chains (and therefore 
the knowledge) are unattainable. 

The reader should take away from this discussion the following general points about reason- 
ing about knowledge; 0 from a specification of a distributed problem, one can derive knowledge 
requirements for solving the problem; (5) from the knowledge requirements and system charac- 
teristics, one can derive message chain requirements; and,® using the knowledge and commu- 
nication requirements, one can prove impossibility results, derive underlying protocol structure, 
and design protocols. Further, one uses high-level concepts that support the formalization of 
some common kinds of informal, intuitive reasoning. 

This paper contains a significant exploration of reasoning about knowledge to analyze dis- 
tributed problems. We and others assert that reasoning about knowledge offers insight into 
the nature of distributed computation, under various system assumptions. This approach also 
offers a tool which supports the analysis of problem specifications to yield useful insights into 
solutions. Often, one must appeal to intuition in formulating an approach to a solution to a 
problem, or a proof to a theorem, but the translation of that intuition into a precise form, to 
be manipulated and analyzed, can be difficult. Knowledge theory gives us a formal and direct 
way to articulate, clarify, and verify our informal intuition about the extent to which the local 
state of a process reflects important global states of the system. The use of knowledge theory 
to analyze distributed problems is relatively new. By applying the abstraction of knowledge 
to different kinds of problems, researchers are finding other "knowledge concepts" to be useful, 
such as probabilistic knowledge [FaHaSSj or resource-bounded knowledge [Mose88, HaMT88]» 
We encourage the use of reasoning about knowledge for examining other distributed problems. 
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A Closure Properties of Systems of Executions 

The proof of the Message Chain Theorem requires three closure properties. Given a set of 
executions £ ^ points {e^,g),{e, f) ePts(^^), and process set P C II, 
replace((e'j3}jP, (gj/)) is the set of executions e" &£ which extend {e\g) such that 
© in each memher of P executes the same event at + 1) as it did at (e', + 1), 
@ in e'', the event executed by each member of P at [e^^g + 1) is replaced by the event it 
executed at (e, / + 1), and 

(5) the messages from P to II are the messages not sent or received in e' between g and 5 + 1, 
plus any messages received in at 5 + 1 by p g P but not in e by p at / + 1, plus any messages 
newly sent by P; and the messages from P to II are whatever was in e' at 5 + 1, plus whatever 
P received in e' at 3 + 1, minus whatever P received in e at / + 1 or was lost in transit to P 
in e at / + 1 [Hadz90]. 

The progress-closure properties we require here are these:^^ 

SI (Nonreceive Progress): The ability of a process to perform a nonRECV event depends on 
the process's behaviour only. 

Let {^,/),(e',5) ePts(f) and p € II be such that 

• RECV(m3 q) ^ {e,f + l,p),ioi any message m, 9 G 11 
Then replace{(e',j),{p},(e,/))?i0. 

SA {NuU Receive Progress): A process' ability to receive a nuU message from another process 
depends on the state of the former process, the messages sent to the recipient by the latter 
process, and the behaviour of the communication system. 

Let e^e' €£ J g € and g^p e II be such that 

• (e>5)'^p(«'.5)i 

• e'i9.^f)[{q}dp}]^e{g,^f)[{qh{p}]. 

(There are no different messages in the communication system from g to p 

up to time g in than in e), and 

• RECV(A,9)lz{e,5 + l,p). 

Then replace{(e', 3), {p}, (e,ff)) / 0. 

S3 (Available Receive Progress): Once a process p has sent a message to another process g, 
9's ability to receive the message cannot depend on p's subsequent behaviour; q^s ability 
to receive the message does depend on the communication system. 



These closure properties correspond to a natural set of assumptions on the protocols which "produce** the 
behaviours in the execution sets [Ma2eS9]. 1 
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Let e' €£ ^ f^g G N, and g,p € 11 be such that 
m RECV(m,?)E(e,5+l,p), 

• {^yUX^PjI) ^ €'{j5-A/') (the message is still available in e' at 5, as in e) 

t let Tj = {(g,m,p,z} I {g,m,p,i} 6 e'{g,J\f) and 0 < i < /}; 
V ^ {?}] (The set of messages available at (€'33)5 sent before 

/ + 1, and from g to p are a subset of those available at {e,g).) 

Then replace((c',jr), {p}, (e, 3)) ^ 0. 

B Independent Recovery and Termination 

We show a connection between a system supporting independent recovery and a system which 
is weakly terminating, fc-transit-bounded^ and subject to process failures and recovery. First, 
we weaken the definition of a C-system which supports independent recovery. In a system which 
supports ti/eaA independent recovery, a failed process may recover and decide in some extension 
without receiving a nonnull message. 

Definition 40 In a C-system M which supports weak independent recovery^ M is subject to 
process failures and recovery, and for aU (e,/) ePts(^)5 / > 0, 

• if, for cGC, FAILHe(/- l,c) and FAIL 7! e(/, c), 
then there is {ei,g) extending {ej) such that (^1,5) ^ACCEPT^V REFUSE^ 
and RECV(m,p) ^ ei{g,c)- ei{f - l,c) 
for all messages m and p € n\{c}; and 

t if FAIH €(/-!, m) and FAIL7re(Am), 
then there is (e 1,5) extending {e,f) such that {ei,g) |= A^^dAWARD^V REJECT^) 
and RECV(m,p) ^ ^1(5, m) - ei{f - 1, m) 
for all messages m and p £ II\{7n}, S 

Therefore, any C-system which supports independent recovery also supports weak indepen- 
dent recovery. 

Now J the proof of the impossibility of independent recovery becomes the proof of the im- 
possibility of weak independent recovery, by the following changes to the proof of Claim 1 in 
the proof of Lemma 23: first, delete the line marked "(*)"; second, in the line marked ^'(**)'\ 
change the phrase "independent recovery, there is k > i*' to "weak independent recovery, there 
is (csjfc) extending (esii)"* Therefore, we have 

Proposition 41 There is no C-system which supports weak independent recovery. G 
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Proposition 42 Any C-system M which is subject to process failures and recovery, fc-tra.nsit 
bounded, and weakly terminating also supports weak independent recovery. 

Proof: By Lemma 22, all processes are nonf ailed and decided in any terminating point 
in M. Further, by Theorem 12, any point in M has a terminating extension in which no 
nonnull message is received. Pick any point (cj/) ePts(^) such that Failed{ej f) ^ 0. There 
is a terminating extension (€1,5) of (e,/) in which no nonnuU messages are received and aU 
processes are nonfailed and decided. Failed^e^f^ = Failed[eij f). Therefore, for every process 
p £ i^aiied(ei,/), there is a time after / in ei such that p recovers (permanently) and p 
decides: for all p € Faii€d{eiy f)^ there is h such that f < h < g and (ei,/i) }^FAILEDp and 
1= ^FAILEDp, for h' > h and 

(if p = c) {eug) 1= (ACCEPTcWREFUSEc) and RECV(m, q) ^ ei{g,p) - ei{f - l,p), or 

(if p = m) N Ac^c{AWARir^V REJECT-^) and RECV(m,g) ^ e^{g,p) - ei{f - l,p). 

Therefore, all p may recover independently, 0 

An alternate proof to Theorem 30 is now the following: Fix any weakly terminating C-system 
M which is A-transit bounded and subject to process failures and recovery; by Proposition 42, 
M supports weak independent recoveryj but Proposition 41 shows that no such system exists, 
a contradiction. 
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